DEseq, voom and vst

4,059 views
3,784 views

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,059
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
120
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

DEseq, voom and vst

  1. 1. DESeq, voom and vst Qiang Kou qkou@umail.iu.edu April 28, 2014 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 1 / 31
  2. 2. Background Advantages of RNA-seq Compared to Microarray Detecting novel transcripts and isoforms High reproducibility, low background Detection of gene fusions and SNPs Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 2 / 31
  3. 3. Background Differential Expression Analysis Steps Normalization Dispersion estimation Statistical testing Methods to be presented DESeq: negative binomial distribution [1] voom: variance modelling at the observational level [2] vst: variance-stabilizing transformation [1, 3] Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 3 / 31
  4. 4. Background Timeline 2002 2004 2006 2008 2010 2012 2014 2016 vst lim m a cuffl inksD Eseq,edgeR baySeq voom Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 4 / 31
  5. 5. Background Why different models? Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 5 / 31
  6. 6. Background RNA-seq is Discrete Garber et al. (2011) Nature Methods 8:469-477 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 6 / 31
  7. 7. Background Length Normalization Within sample: gene length Between samples: library size RPKM and FPKM Reads/fragments per kilobase per million mapped reads Normalization for gene length and library size Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 7 / 31
  8. 8. Background Different Distribution 0.0 0.2 0.4 0.6 1 2 3 4 expression density (a) Microarray 0.0 0.1 0.2 0.3 0.4 −2 0 2 4 log10(fpkm) density condition Untreated CG8144_RNAi genes (b) RNA-seq Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 8 / 31
  9. 9. Background Differential Expression as a Function of Transcript Length 0 2000 4000 6000 8000 020406080 Sequencing Data (Sultan) %DE a 0 2000 4000 6000 8000 020406080 Array Data (Sultan) Transcript length (bp) %DE b 2000 4000 6000 8000 10000 024681012 Sequencing Data (Cloonan) Transcript length (bp) %DE c 0 1000 2000 3000 4000 5000 6000 7000 020406080 Sequencing Data (Marioni) d 1000 3000 5000 7000 020406080 Array Data (Marioni) Transcript length (bp) e 1000 2000 3000 4000 5000 6000 7000 020406080 Sequencing Data (Marioni) f 1000 2000 3000 4000 5000 6000 7000 020406080 Array Data (Marioni) Transcript length (bp) g Oshlack et al. (2009) Biology Direct 4:14 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 9 / 31
  10. 10. Background Poisson and Negative Binomial Distribution Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 10 / 31
  11. 11. Background Poisson Distribution Graph from Wikipedia Pr(X = k) = λk e−λ k! E(x) = Var(X) = λ A list of genes g1, g2, . . . gn X ∼ Poisson(λ), a random variable representing the number of reads falling in gi Likelihood ratio test Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 11 / 31
  12. 12. Background Negative Binomial Distribution Graph from Wikipedia X ∼ NB(r; p) Pr(X = k) = Ck k+r−1pk (1 − p)r p: probability of success r: predefined number of failures X: number of successes until r failures Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 12 / 31
  13. 13. Background DEseq, voom and vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 13 / 31
  14. 14. DEseq, voom and vst Normalization in DESeq Assumption Most genes not expressed differentially Differentially expressed genes divided equally between up- and down-regulation Steps Geometric mean of gene’s counts across all samples Divide gene’s counts by the geometric mean Normalization factor: median of ratios Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 14 / 31
  15. 15. Model in DESeq Model in DESeq Read counts for gene i in sample j follows negative binomial distribution Kij ∼ NB(µij , σ2 ij ) Why not Poisson distribution? In RNA-seq, variance is larger than mean Very difficult to estimate µij and σ2 ij Parameters estimation is the main difference between methods based on NB distribution Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 15 / 31
  16. 16. Model in DESeq Model in DESeq Count sum for gene i in condition A: a Count sum for gene i in condition B: b Sum: κ = a + b p(a), p(b) and p(a, b) p-value: p = i+j=κ,p(i,j)<p(a,b) p(i, j) i+j=κ p(i, j) Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 16 / 31
  17. 17. Model in DESeq R code for DESeq library(DESeq) DESeq.cds = newCountDataSet(countData = data.sim$counts, conditions = factor(data.sim$treatment)) DESeq.cds = estimateSizeFactors(DESeq.cds) DESeq.cds = estimateDispersions(DESeq.cds, fitType = "local") DESeq.test = nbinomTest(DESeq.cds, "1", "2") DESeq.pvalues = DESeq.test$pval DESeq.adjpvalues = p.adjust(DESeq.pvalues, method = "BH") Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 17 / 31
  18. 18. Model in limma Model in limma Linear Models for Microarray Data: lmFit() Classical t-test: tj = µ1j −µ2j σ2 j ( 1 n1 + 1 n2 ) Very hard to get the σ2 j from a small sample size limma: moderated t-test Use information from other genes σ2 j ∼ Inverse Gamma(α, β) Empirical Bayesian for parameter estimate: eBayes() Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 18 / 31
  19. 19. Model in voom Model in voom voom: variance modelling at the observational level Locally weighted regression to get the relation between count and variance Moderated t-test in limma Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 19 / 31
  20. 20. Model in voom Model in voom 4 6 8 10 12 14 0.00.20.40.60.81.0 Average log2(count size + 0.5) Sqrt(standarddeviation) a 4 6 8 10 12 14 Average log2(count size + 0.5) voom: Mean−variance trend b 4 6 8 10 12 14 Fitted log2(count size + 0.5) c 1.2 Law et al. Genome Biology 2014, 15:R29 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 20 / 31
  21. 21. Model in voom R code for voom library(limma) library(DESeq) group = factor(conditions) nf = calcNormFactors(data.matrix, method = "TMM") voom.data = voom(data.matrix, design = model.matrix(~group), lib.size = colSums(data.matrix) * nf) voom.data$genes = rownames(data.matrix) voom.fitlimma = lmFit(voom.data, design = model.matrix(~group)) voom.fitbayes = eBayes(voom.fitlimma) voom.pvalues = voom.fitbayes$p.value[, 2] voom.adjpvalues = p.adjust(voom.pvalues, method = "BH") voom.genes <- data.matrix[which(voom.adjpvalues <= 0.05), ] Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 21 / 31
  22. 22. Model in vst Model in vst Variance-stabilizing transformation To find a simple function f to create new values y = f (x) that the variability of y is not related to mean A method used in microarray data analysis [4] Moderated t-test in limma Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 22 / 31
  23. 23. Model in vst R code for vst library(DESeq) library(limma) group = factor(conditions) DESeq.cds = newCountDataSet(countData = data.matrix, conditions = group) DESeq.cds = estimateSizeFactors(DESeq.cds) DESeq.cds = estimateDispersions(DESeq.cds, method = "blind", fitType = "local") DESeq.vst = getVarianceStabilizedData(DESeq.cds) DESeq.vst.fitlimma = lmFit(DESeq.vst, design = model.matrix(~group)) DESeq.vst.fitbayes = eBayes(DESeq.vst.fitlimma) DESeq.vst.pvalues = DESeq.vst.fitbayes$p.value[, 2] DESeq.vst.adjpvalues = p.adjust(DESeq.vst.pvalues, method = "BH") DESeq.vst.genes <- data.matrix[which(DESeq.vst.adjpvalues <= 0.05), ] Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 23 / 31
  24. 24. Results from Simulation AUC Results 0.5 0.6 0.7 0.8 5.0 7.5 10.0 12.5 15.0 #sample/condition AUC software baySeq DESeq EBSeq edgeR NBPSeq SAMseq ShrinkSeq TSPM. voom vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 24 / 31
  25. 25. Results from Simulation Differential Expression Gene Number 1 10 baySeq DESeq NBPSeq voom vst edgeR ShrinkSeq TSPM EBSeq SAMSeq software value variable correct incorrect Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 25 / 31
  26. 26. Results from Simulation Running Time 0 100 200 300 400 500 5.0 7.5 10.0 12.5 15.0 #sample/condition time(sec) software baySeq DESeq EBSeq edgeR NBPSeq SAMseq ShrinkSeq TSPM voom vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 26 / 31
  27. 27. Results from Simulation Running Time with 15 Samples per Condition Software AUC Time edgeR 0.810 0.630 DESeq 0.652 48.388 NBPSeq 0.767 24.942 baySeq 0.495 210.781 EBSeq 0.769 12.666 TSPM 0.836 7.486 SAMseq 0.827 1.801 voom 0.835 0.264 vst 0.830 0.138 ShrinkSeq 0.796 343.260 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 27 / 31
  28. 28. Results from Simulation Venn Diagram for Drosophila melanogaster 4 7 13 11 310 178 17 DESeq voom vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 28 / 31
  29. 29. Some Conclusion Some Conclusion Each method has many assumptions Negative binomial model has a relatively better specificity and sensitivity Good performance of voom and vst in accuracy and time, no difference between them All methods will have better performance with larger sample, however, sample size very limited in practice Different normalization in cuffdiff: both alternative isoforms and length of transcripts Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 29 / 31
  30. 30. Some Conclusion References Simon Anders and Wolfgang Huber. Differential expression analysis for sequence count data. Genome Biology, 11:R106, 2010. Charity W Law, Yunshun Chen, Wei Shi, and Gordon K Smyth. Voom: precision weights unlock linear model analysis tools for rna-seq read counts. Genome Biology, 15(2):R29, 2014. Gordon K Smyth. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3:Article 3, 2004. Blythe P Durbin, Johanna S Hardin, Douglas M Hawkins, and David M Rocke. A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics, pages S105–S110, 2002. Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 30 / 31
  31. 31. Thanks Thanks Thank you for your time! Qiang Kou qkou@umail.iu.edu Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 31 / 31

×