This page contains links to source code, videos, and other materials from the EdX series titled Statistics and R for the Life Sciences, developed by Rafael Irizarry and Mike Love of the Harvard School of Public Health.
Code from the class is available from github: https://github.com/genomicsclass/labs
The series included eight courses. Links to videos appear below. To view quizzes and video transcripts, visit the course Web pages on EdX.
Note: As of October 2015, you can view archived course materials if you register for an account. You may also have to enroll in an upcoming offering of the class.
HarvardX: PH525.1x Statistics and R for the Life Sciences
Week 1: R programming skills
- Getting started with R – https://youtu.be/lEmqVSGdifQ
- RStudio – https://youtu.be/mFXRIav9vkQ
- RStudio for organization – https://youtu.be/812ruYN4PZQ
Week 2: Probability Distributions I
- Introduction to random variables – https://youtu.be/HULWmM_Ao1U
- Introduction to random variables II – https://youtu.be/wswminMpiwk
- Introduction to null distributions – https://youtu.be/_IpRw1oHXRw (Formatted Markdown page.)
Week 2: Probability Distributions II
- Probability distributions – https://youtu.be/govBS0uJ9GA
- The normal distribution – https://youtu.be/fwaxgik7aj4
Week 3: Inference I
- Introduction to statistical inference – https://youtu.be/Os5ph7S06_A
- Populations, parameters, and sample estimates – https://youtu.be/99WNX608k0Y
- Central limit theorem – https://youtu.be/aYA8ZG-ltqQ
- CLT in practice – https://youtu.be/10LRBgv3A64
- CLT in practice II – https://youtu.be/GgpU_nGkrQc
- T tests in practice – https://youtu.be/7eT_Cq4cm8s
- T tests in practice II – https://youtu.be/s3jZ1z08Geg
Week 3: Inference II
- Confidence intervals – https://youtu.be/pIxF30BuX3w
- Confidence intervals II – https://youtu.be/9qKyDHJWrgg
- Confidence intervals II – https://youtu.be/vuYXrpX5Uo4
- Power calculations – https://youtu.be/17HisXgOmeM
- Association tests – https://youtu.be/h8Rn7rr9M34
- Association tests in practice – https://youtu.be/sutvu8pSVuI
Week 3: Inference II
- Monte Carlo simulation – https://youtu.be/qP9w87RGHAY
- Permutations – https://youtu.be/7kx5kmHi7-8
Week 4: Exploratory Data Analysis I
- Histogram – https://youtu.be/UaXYRf6qtEg
- QQ plot – https://youtu.be/5F62EwMF26c
- Boxplot -https://youtu.be/Hh-Pd23OmVo
Week 4: Exploratory Data analysis II
- Scatterplot – https://youtu.be/dmJzInKpuRE
- Symmetry of log ratios – https://youtu.be/kxW4bCrYvco
- Plots to avoid – https://youtu.be/p-dYnSbBTa8
- Avoid pseudo 3D – https://youtu.be/15dRwC-gP0Q
Week 4: Introducing dplyr
- Introducing dyplyr I – https://youtu.be/drDmIFtnoNE
- Introducing dyplyr II – https://youtu.be/UJU7H6SWiwo
Week 4: Robust summaries
- Median, MAD, and Spearman correlation – https://youtu.be/vLDxz51pLZQ
- Mann-Witney-Wilcoxon test – https://youtu.be/3WKOnz6L1Fc
HarvardX: PH525.2x Matrix Algebra and Linear Models
Week 1: Introduction
- Introduction – https://youtu.be/tPlHbAHVqFQ
- Matrix notations – https://youtu.be/EaYkxUwEB-Q
- Matrix operations – https://youtu.be/-5uvdduYNJM
- Linear Models as matrix multiplication I – https://youtu.be/gP7mgpli5t4
- Examples – https://youtu.be/FAP7fYbZF0Y
- Linear models as matrix multiplication II – https://youtu.be/pw7I70rlkdM – Explains two-group comparison as a linear model.
Week 1: Matrix Algebra
- Matrix algebra in practice I (in R) – https://youtu.be/LniqeWOfTQo (Code on github.)
- Matrix algebra in practice II (in R) – https://youtu.be/eRXzsXh78rE
Week 2: Testing Linear Models
- Standard errors of regression coefficients – https://youtu.be/9rm-y_iYbnw
- Fitting linear models and testing – https://youtu.be/TSOzvcAgV70
- Expressing experimental designs in R – https://youtu.be/KpSS2e4Y24w
- Linear models in practice I – https://youtu.be/xCdSyc3K3Ew
- Linear models in practice II – https://youtu.be/Opa8i0QxKCo
Week 2: Complex designs
- Interactions and contrasts in linear models I – https://youtu.be/Wa1QkyF4peU
- Interactions and contrasts in linear models II – https://youtu.be/ZU5jb86vXag
- Interactions and contrasts III – https://youtu.be/wSJ3yuPiAbg
- Interactions and contrasts IV – https://youtu.be/bBmhUyOmeZc
- Interactions and contrasts V – https://youtu.be/pTPxxU6Zslc
Week 2: Calculation of linear models
- Colinearity – https://youtu.be/dyzbzbUHZHY
- QR factorization – https://youtu.be/yL3lrirzNnQ
HarvardX: PH525.4x Introduction to Bioconductor
Week 1: Getting Started
- The Bioconductor portal: installatoin, documentation, and help – https://youtu.be/XZGNMw68-rQ
Week 1: A review of what we measure and why
- Introduction – https://youtu.be/-0OLQqfxXQI
- Overview of What We Measure and Why – https://youtu.be/H06H22RMux8
- Molecular Basis for Phenotypic Variation – https://youtu.be/dFtvfzgYfq0
- DNA: chromosomes, replication, SNPs and other variants – https://youtu.be/OPFWrC_KEGg
- Gene Expression – https://youtu.be/X_nKUGRKhlk
Week 1: A first look at Bioconductor
- Motivation – https://youtu.be/Os8cDC-4CiY
- Annotation, Assays, Algorithms, and Architecture – https://youtu.be/ZNvCYa778WA
- Annotating phenotypes and molecular functions – https://youtu.be/E7UIsf8WPTY
- The ExpressionSet container – https://youtu.be/0ilR2Q6eLBk
Week 2: Computing on genomic regions
- Motivation and Introduction – https://youtu.be/XhEk_5Uz5OI
- Introduction to GenomicRanges – https://youtu.be/27TkhfqDydE
- Interval ranges: IRanges – https://youtu.be/UuS-IRGW-fE
- Genomic ranges: GRanges – https://youtu.be/CotroQf5hvg
- Operating on GRanges – https://youtu.be/I-HCkOg39MI
- Finding Overlaps – https://youtu.be/t2rHIkq2ydg
- Genes As GRanges – https://youtu.be/z6bg-snkUNs
- Finding the Nearest Gene – https://youtu.be/KflE8sObeMw
- Annotating Genes – https://youtu.be/OEyoDzED8Ck
- Getting the Sequence of Regions – https://youtu.be/twDnpOUQARg
Week 2: Advanced annotation
- The Human Genome in R – https://youtu.be/VtGdQBq1ZDw
- LiftOver: Converting Across Versions of the Human Genome – https://youtu.be/chzxzzlf3Vg
- Exons, Introns, and Transcripts – https://youtu.be/bgLVNfeeETM
- Transcript Annotations and Gene Models – https://youtu.be/RIO9nAtkuc0
- Importing and Exporting BED files – https://youtu.be/UbbPkK_z96s
- AnnotationHub – https://youtu.be/MbNeRtlRKOk
- Genome Wide Annotation Packages – https://youtu.be/inINBG4nRTY
- Gene Ontology Tables – https://youtu.be/PvCArWGvnOg
- Kyoto Encyclopedia of Genes and Genomes (KEGG) – https://youtu.be/tJPJpbkGjaI
- More on the Homo.sapiens package – https://youtu.be/iQMMTrVhsgE
Week 3: Introduction to microarray technologies
- Microarray Technology 1: How Hybridization Works – https://youtu.be/vj3vgkf5rTE
- Microarray Technology 2: How Microarrays Work – https://youtu.be/pdr6aVFciiM
- Microarray Technologies 3: Applications of Microarrays in Genomics – https://youtu.be/AXoPsY6kyZM
Week 3: Introduction to next generation sequencing technology
- Next Generation Sequencing Technology 1: Brief Introduction to the Mechanics of NGS – https://youtu.be/jQuShWX0ERU
- Next Generation Sequencing Technology 2: Applications of NGS in Genomics – https://youtu.be/89KFBHER5cM
Week 3: Importing and organizing high throughput data
- Bioconductor Infrastructure: ExpressionSet and SummarizedExperiment – https://youtu.be/bokZn6qd1kk
Week 3: Importing microarray data
- Lab: Reading Microarray Raw Data – https://youtu.be/f7QuRhKD_t8 – Covers Affymetrix and Agilent arrays
Week 3: Importing next generation sequencing data
- Creating a count table from a BAM file – https://youtu.be/d9RgQ5xbi94 – In Loraine Lab we use command-line version of featureCounts
- Mapping Algorithms and Software – https://youtu.be/n7gbw4DjE9o
Week 3: Transformations and exploratory data analysis
- Examples from Genomics – https://youtu.be/IfO9OdNyEvE
- Detecting Quality Problems with EDA – https://youtu.be/lggp8EzR5nY
- Log Transformation – https://youtu.be/G56StIbkMlA
- EDA for Next Generation Sequencing – https://youtu.be/2AuDMC6APvM
Week 3: Modeling microarray data
- Introduction to Microarray Background Noise – https://youtu.be/5r0PWL1aakk
- Probe Level Model – https://youtu.be/ToFaVtyGoP4
- Different Approaches to Background Adjustment – https://youtu.be/9BCIgRr1KJc
Week 3: Normalization
- The Need for Normalization – https://youtu.be/DnLZOaERolg
- Motivating Normalization – https://youtu.be/HmokevW15QI
- Local Regression: loess – https://youtu.be/l81ooxzbz6M
- Loess Normalization Applied to Data – https://youtu.be/_r0cmw2VjBU
- Quantile Normalization – https://youtu.be/_jd4q17tRAI
- Quantile Normalization Applied to Data – https://youtu.be/UVjgMTG1nIc
- Variance Stabilizing Normalization – https://youtu.be/ExQkVM4WrYs
- When Not to Normalize – https://youtu.be/91xrH7cWw8Q
- Subset Quantile Normalization – https://youtu.be/YdxszCUhalY
Week 3: Visualizing and normalizing NGS data
- Visualizing NGS data part 1 – https://youtu.be/wxeAJ26PeHg
- Visualizing NGS data part 2 – https://youtu.be/DPPXZXagLEw
- Visualizing NGS data part 3 – https://youtu.be/hAyUp5hruug
- Normalization for RNA-seq – https://youtu.be/HnMJHzDwK4U
Week 4: Inference
- Biological versus Technical Variability – https://youtu.be/7GQprJ-F73Y
- t-tests in Genomics – https://youtu.be/bkP7YZqYvmM
- Moderated t-test with the limma package – https://youtu.be/5iNUucEn68w
- Gene Sets – https://youtu.be/p3JzmaeqiCQ
- Summary Statistics for Gene Sets – https://youtu.be/Ra3JeXSRda0
- Hypothesis Testing for Gene Sets – https://youtu.be/pZbicSgjLqw
- Permutations for Gene Set Inference – https://youtu.be/_a2fV_kXcBk
- Gene Set Testing in R Part I – https://youtu.be/bfp99aDhjqU
- Gene Set Testing in R Part II – https://youtu.be/IW_YBOe-E0A
Week 4: Architecture: fostering integrative analysis of genome-scale data
- Visualizing genomic features: ggbio’s autoplot – https://youtu.be/5yC-KU3HH4s
- R package building – https://youtu.be/SALAd-Lr55g
- Installing and checking a new package – https://youtu.be/Nsjf9XomEFk
- Creating and installing an integrative annotation package – https://youtu.be/zSHim9YzWKg
- External data: Relational Data Base Management Systems (RDBMS) – https://youtu.be/Q8_dGb7BKTM
- Tabix for random access to genomic flat files – https://youtu.be/OfEO3rwzM2Y
- Combining Expression and ChIP-chip data in yeast – https://youtu.be/cY_LxK_1Dms
- GRanges for an annotated GWAS catalog – https://youtu.be/tNInpIdMIF8
- GWAS hits in ESRRA binding peaks – https://youtu.be/TuMiU4dtULU
- Querying GEOmetadb – https://youtu.be/NSIMr5FIjW0
Week 4: Parallel computing and software engineering in Bioconductor
- A view of sequential and concurrent iteration – https://youtu.be/X5ZP3XXhLxk
- Concurrent counting of RNA-seq reads – https://youtu.be/OA4vyEF9pyQ
- Bioconductor’s AMI with StarCluster – https://youtu.be/cZZBP5vwW64
- BatchJobs for distributed counting – https://youtu.be/ReOkMpYiSzw
- Software quality control and continuous integration – https://youtu.be/h7MVHtjGSbk
HarvardX: PH525.7x Case Study: ChIP-seq data analysis
Week 1: Introduction to ChIP-Seq
- Introduction to transcription regulation – https://youtu.be/yNtahPOVdEE
- ChIP-seq technique – https://youtu.be/tn_SZElBDOA
- ChIP-Seq peak calling – https://youtu.be/933kKxGdD90
- ChIP-seq quality control 1 – https://youtu.be/nDrsyvntCw0
- Hands-on tutorial on running macs2 – https://youtu.be/r9zxzskkJ7Y
Week 2: Adanced ChIP-Seq analysis
- ChIP-Seq quality control 2 – https://youtu.be/auUj6i-5sO4
- ChIP-Seq target genes – https://youtu.be/MfIrL5hPVqU
- ChIP-Seq example – https://youtu.be/_yAnodofGc0
- Cistrome – https://youtu.be/4dVej4drfOA
- Cistrome analysis pipeline hands on – https://youtu.be/2-2XpNiz7VE
- BETA software suite – https://youtu.be/CRsflDp0XBQ