Statistics and R for the Life Sciences

This page contains links to source code, videos, and other materials from the EdX series titled Statistics and R for the Life Sciences, developed by Rafael Irizarry and Mike Love of the Harvard School of Public Health.

Code from the class is available from github: https://github.com/genomicsclass/labs

The series included eight courses. Links to videos appear below. To view quizzes and video transcripts, visit the course Web pages on EdX.

Note: As of October 2015, you can view archived course materials if you register for an account. You may also have to enroll in an upcoming offering of the class.


HarvardX: PH525.1x Statistics and R for the Life Sciences

Week 1: R programming skills

  1. Getting started with R – https://youtu.be/lEmqVSGdifQ
  2. RStudio – https://youtu.be/mFXRIav9vkQ
  3. RStudio for organization – https://youtu.be/812ruYN4PZQ

Week 2: Probability Distributions I

  1. Introduction to random variables – https://youtu.be/HULWmM_Ao1U
  2. Introduction to random variables II – https://youtu.be/wswminMpiwk
  3. Introduction to null distributions – https://youtu.be/_IpRw1oHXRw (Formatted Markdown page.)

Week 2: Probability Distributions II

  1. Probability distributions – https://youtu.be/govBS0uJ9GA
  2. The normal distribution – https://youtu.be/fwaxgik7aj4

Week 3: Inference I

  1. Introduction to statistical inference – https://youtu.be/Os5ph7S06_A
  2. Populations, parameters, and sample estimates – https://youtu.be/99WNX608k0Y
  3. Central limit theorem – https://youtu.be/aYA8ZG-ltqQ
  4. CLT in practice – https://youtu.be/10LRBgv3A64
  5. CLT in practice II – https://youtu.be/GgpU_nGkrQc
  6. T tests in practice – https://youtu.be/7eT_Cq4cm8s
  7. T tests in practice II – https://youtu.be/s3jZ1z08Geg

Week 3: Inference II

  1. Confidence intervals – https://youtu.be/pIxF30BuX3w
  2. Confidence intervals II – https://youtu.be/9qKyDHJWrgg
  3. Confidence intervals II – https://youtu.be/vuYXrpX5Uo4
  4. Power calculations – https://youtu.be/17HisXgOmeM
  5. Association tests – https://youtu.be/h8Rn7rr9M34
  6. Association tests in practice – https://youtu.be/sutvu8pSVuI

Week 3: Inference II

  1. Monte Carlo simulation – https://youtu.be/qP9w87RGHAY
  2. Permutations – https://youtu.be/7kx5kmHi7-8

Week 4: Exploratory Data Analysis I

  1. Histogram – https://youtu.be/UaXYRf6qtEg
  2. QQ plot – https://youtu.be/5F62EwMF26c
  3. Boxplot -https://youtu.be/Hh-Pd23OmVo

Week 4: Exploratory Data analysis II

  1. Scatterplot – https://youtu.be/dmJzInKpuRE
  2. Symmetry of log ratios – https://youtu.be/kxW4bCrYvco
  3. Plots to avoid – https://youtu.be/p-dYnSbBTa8
  4. Avoid pseudo 3D – https://youtu.be/15dRwC-gP0Q

Week 4: Introducing dplyr

  1. Introducing dyplyr I – https://youtu.be/drDmIFtnoNE
  2. Introducing dyplyr II – https://youtu.be/UJU7H6SWiwo

Week 4: Robust summaries

  1. Median, MAD, and Spearman correlation – https://youtu.be/vLDxz51pLZQ
  2. Mann-Witney-Wilcoxon test – https://youtu.be/3WKOnz6L1Fc

 HarvardX: PH525.2x Matrix Algebra and Linear Models

Week 1: Introduction

  1. Introduction – https://youtu.be/tPlHbAHVqFQ
  2. Matrix notations – https://youtu.be/EaYkxUwEB-Q
  3. Matrix operations – https://youtu.be/-5uvdduYNJM
  4. Linear Models as matrix multiplication I – https://youtu.be/gP7mgpli5t4
  5. Examples – https://youtu.be/FAP7fYbZF0Y
  6. Linear models as matrix multiplication II – https://youtu.be/pw7I70rlkdM – Explains two-group comparison as a linear model.

Week 1: Matrix Algebra

  1. Matrix algebra in practice I (in R) – https://youtu.be/LniqeWOfTQo (Code on github.)
  2. Matrix algebra in practice II (in R) – https://youtu.be/eRXzsXh78rE

Week 2: Testing Linear Models

  1. Standard errors of regression coefficients – https://youtu.be/9rm-y_iYbnw
  2. Fitting linear models and testing – https://youtu.be/TSOzvcAgV70
  3. Expressing experimental designs in R – https://youtu.be/KpSS2e4Y24w
  4. Linear models in practice I – https://youtu.be/xCdSyc3K3Ew
  5. Linear models in practice II – https://youtu.be/Opa8i0QxKCo

Week 2: Complex designs

  1. Interactions and contrasts in linear models I – https://youtu.be/Wa1QkyF4peU
  2. Interactions and contrasts in linear models II – https://youtu.be/ZU5jb86vXag
  3. Interactions and contrasts III – https://youtu.be/wSJ3yuPiAbg
  4. Interactions and contrasts IV – https://youtu.be/bBmhUyOmeZc
  5. Interactions and contrasts V – https://youtu.be/pTPxxU6Zslc

Week 2: Calculation of linear models

  1. Colinearity – https://youtu.be/dyzbzbUHZHY
  2. QR factorization – https://youtu.be/yL3lrirzNnQ

HarvardX: PH525.4x Introduction to Bioconductor

Week 1: Getting Started

  1. The Bioconductor portal: installatoin, documentation, and help – https://youtu.be/XZGNMw68-rQ

Week 1: A review of what we measure and why

  1. Introduction – https://youtu.be/-0OLQqfxXQI
  2. Overview of What We Measure and Why – https://youtu.be/H06H22RMux8
  3. Molecular Basis for Phenotypic Variation – https://youtu.be/dFtvfzgYfq0
  4. DNA: chromosomes, replication, SNPs and other variants – https://youtu.be/OPFWrC_KEGg
  5. Gene Expression – https://youtu.be/X_nKUGRKhlk

Week 1: A first look at Bioconductor

  1. Motivation – https://youtu.be/Os8cDC-4CiY
  2. Annotation, Assays, Algorithms, and Architecture – https://youtu.be/ZNvCYa778WA
  3. Annotating phenotypes and molecular functions – https://youtu.be/E7UIsf8WPTY
  4. The ExpressionSet container – https://youtu.be/0ilR2Q6eLBk

Week 2: Computing on genomic regions

  1. Motivation and Introduction – https://youtu.be/XhEk_5Uz5OI
  2. Introduction to GenomicRanges – https://youtu.be/27TkhfqDydE
  3. Interval ranges: IRanges – https://youtu.be/UuS-IRGW-fE
  4. Genomic ranges: GRanges – https://youtu.be/CotroQf5hvg
  5. Operating on GRanges – https://youtu.be/I-HCkOg39MI
  6. Finding Overlaps – https://youtu.be/t2rHIkq2ydg
  7. Genes As GRanges – https://youtu.be/z6bg-snkUNs
  8. Finding the Nearest Gene – https://youtu.be/KflE8sObeMw
  9. Annotating Genes – https://youtu.be/OEyoDzED8Ck
  10. Getting the Sequence of Regions – https://youtu.be/twDnpOUQARg

Week 2: Advanced annotation

  1. The Human Genome in R – https://youtu.be/VtGdQBq1ZDw
  2. LiftOver: Converting Across Versions of the Human Genome – https://youtu.be/chzxzzlf3Vg
  3. Exons, Introns, and Transcripts – https://youtu.be/bgLVNfeeETM
  4. Transcript Annotations and Gene Models – https://youtu.be/RIO9nAtkuc0
  5. Importing and Exporting BED files – https://youtu.be/UbbPkK_z96s
  6. AnnotationHub – https://youtu.be/MbNeRtlRKOk
  7. Genome Wide Annotation Packages – https://youtu.be/inINBG4nRTY
  8. Gene Ontology Tables – https://youtu.be/PvCArWGvnOg
  9. Kyoto Encyclopedia of Genes and Genomes (KEGG) – https://youtu.be/tJPJpbkGjaI
  10. More on the Homo.sapiens package – https://youtu.be/iQMMTrVhsgE

Week 3: Introduction to microarray technologies

  1. Microarray Technology 1: How Hybridization Works – https://youtu.be/vj3vgkf5rTE
  2. Microarray Technology 2: How Microarrays Work – https://youtu.be/pdr6aVFciiM
  3. Microarray Technologies 3: Applications of Microarrays in Genomics – https://youtu.be/AXoPsY6kyZM

Week 3: Introduction to next generation sequencing technology

  1. Next Generation Sequencing Technology 1: Brief Introduction to the Mechanics of NGS – https://youtu.be/jQuShWX0ERU
  2. Next Generation Sequencing Technology 2: Applications of NGS in Genomics – https://youtu.be/89KFBHER5cM

Week 3: Importing and organizing high throughput data

  1. Bioconductor Infrastructure: ExpressionSet and SummarizedExperiment – https://youtu.be/bokZn6qd1kk

Week 3: Importing microarray data

  1. Lab: Reading Microarray Raw Data – https://youtu.be/f7QuRhKD_t8 – Covers Affymetrix and Agilent arrays

Week 3: Importing next generation sequencing data

  1. Creating a count table from a BAM file – https://youtu.be/d9RgQ5xbi94 – In Loraine Lab we use command-line version of featureCounts
  2. Mapping Algorithms and Software – https://youtu.be/n7gbw4DjE9o

Week 3: Transformations and exploratory data analysis

  1. Examples from Genomics – https://youtu.be/IfO9OdNyEvE
  2. Detecting Quality Problems with EDA – https://youtu.be/lggp8EzR5nY
  3. Log Transformation – https://youtu.be/G56StIbkMlA
  4. EDA for Next Generation Sequencing – https://youtu.be/2AuDMC6APvM

Week 3: Modeling microarray data

  1. Introduction to Microarray Background Noise – https://youtu.be/5r0PWL1aakk
  2. Probe Level Model – https://youtu.be/ToFaVtyGoP4
  3. Different Approaches to Background Adjustment – https://youtu.be/9BCIgRr1KJc

Week 3: Normalization

  1. The Need for Normalization – https://youtu.be/DnLZOaERolg
  2. Motivating Normalization – https://youtu.be/HmokevW15QI
  3. Local Regression: loess – https://youtu.be/l81ooxzbz6M
  4. Loess Normalization Applied to Data – https://youtu.be/_r0cmw2VjBU
  5. Quantile Normalization – https://youtu.be/_jd4q17tRAI
  6. Quantile Normalization Applied to Data – https://youtu.be/UVjgMTG1nIc
  7. Variance Stabilizing Normalization – https://youtu.be/ExQkVM4WrYs
  8. When Not to Normalize – https://youtu.be/91xrH7cWw8Q
  9. Subset Quantile Normalization – https://youtu.be/YdxszCUhalY

Week 3: Visualizing and normalizing NGS data

  1. Visualizing NGS data part 1 – https://youtu.be/wxeAJ26PeHg
  2. Visualizing NGS data part 2 – https://youtu.be/DPPXZXagLEw
  3. Visualizing NGS data part 3 – https://youtu.be/hAyUp5hruug
  4. Normalization for RNA-seq – https://youtu.be/HnMJHzDwK4U

Week 4: Inference

  1. Biological versus Technical Variability – https://youtu.be/7GQprJ-F73Y
  2. t-tests in Genomics – https://youtu.be/bkP7YZqYvmM
  3. Moderated t-test with the limma package – https://youtu.be/5iNUucEn68w
  4. Gene Sets – https://youtu.be/p3JzmaeqiCQ
  5. Summary Statistics for Gene Sets – https://youtu.be/Ra3JeXSRda0
  6. Hypothesis Testing for Gene Sets – https://youtu.be/pZbicSgjLqw
  7. Permutations for Gene Set Inference – https://youtu.be/_a2fV_kXcBk
  8. Gene Set Testing in R Part I – https://youtu.be/bfp99aDhjqU
  9. Gene Set Testing in R Part II – https://youtu.be/IW_YBOe-E0A

Week 4: Architecture: fostering integrative analysis of genome-scale data

  1. Visualizing genomic features: ggbio’s autoplot – https://youtu.be/5yC-KU3HH4s
  2. R package building – https://youtu.be/SALAd-Lr55g
  3. Installing and checking a new package – https://youtu.be/Nsjf9XomEFk
  4. Creating and installing an integrative annotation package – https://youtu.be/zSHim9YzWKg
  5. External data: Relational Data Base Management Systems (RDBMS) – https://youtu.be/Q8_dGb7BKTM
  6. Tabix for random access to genomic flat files – https://youtu.be/OfEO3rwzM2Y
  7. Combining Expression and ChIP-chip data in yeast – https://youtu.be/cY_LxK_1Dms
  8. GRanges for an annotated GWAS catalog – https://youtu.be/tNInpIdMIF8
  9. GWAS hits in ESRRA binding peaks – https://youtu.be/TuMiU4dtULU
  10. Querying GEOmetadb – https://youtu.be/NSIMr5FIjW0

Week 4: Parallel computing and software engineering in Bioconductor

  1. A view of sequential and concurrent iteration – https://youtu.be/X5ZP3XXhLxk
  2. Concurrent counting of RNA-seq reads – https://youtu.be/OA4vyEF9pyQ
  3. Bioconductor’s AMI with StarCluster – https://youtu.be/cZZBP5vwW64
  4. BatchJobs for distributed counting – https://youtu.be/ReOkMpYiSzw
  5. Software quality control and continuous integration – https://youtu.be/h7MVHtjGSbk

 


HarvardX: PH525.7x Case Study: ChIP-seq data analysis

Week 1: Introduction to ChIP-Seq

  1. Introduction to transcription regulation – https://youtu.be/yNtahPOVdEE
  2. ChIP-seq technique – https://youtu.be/tn_SZElBDOA
  3. ChIP-Seq peak calling – https://youtu.be/933kKxGdD90
  4. ChIP-seq quality control 1 – https://youtu.be/nDrsyvntCw0
  5. Hands-on tutorial on running macs2 – https://youtu.be/r9zxzskkJ7Y

Week 2: Adanced ChIP-Seq analysis

  1. ChIP-Seq quality control 2 – https://youtu.be/auUj6i-5sO4
  2. ChIP-Seq target genes – https://youtu.be/MfIrL5hPVqU
  3. ChIP-Seq example – https://youtu.be/_yAnodofGc0
  4. Cistrome – https://youtu.be/4dVej4drfOA
  5. Cistrome analysis pipeline hands on – https://youtu.be/2-2XpNiz7VE
  6. BETA software suite – https://youtu.be/CRsflDp0XBQ