Technical and biological variance structure in
mRNA-Seq data:life in the real world
Paper by
Ann Oberg, et al.
October 2, ...
Concept
Suppose x is helpful in predicting y.
y = β0 + β1x + (1)
∼ N(0, σ2
)
No variation, no model
◦
C = (◦
F − 32) ×
5
9...
Concept
RNASeq studies, sources of variation
Technical variation: flowcell, replication in lanes, library
preparation etc
B...
Concept
Technical variation Poisson distribution: Var(Y ) = µ
Total variation over-dispersion: Var(Y ) > µ
within sample v...
Purpose of the paper
Describe the mean variance relationship in mRNA Seq data
1. Var(Y ) = µ: Poisson
2. Var(Y ) = kµ: Ove...
Purpose of the paper
Estimation of φ is very crucial step
1. per gene, glm.nb function MASS
2. local, empirical Bayes esti...
Data and Statistical Experimental Design, Figure 1
25 study subjects (all female caucasians): 12 high and 13 low
antibody ...
Figure: 2
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Statistical Analysis
Models were fit to unstimulated specimens only to focus on
biological variation
Counts for the two tec...
Technical variation
Representative scatter plot of technical replicate 1 versus technical
replicate 2 for one subject. Spe...
Technical variation
The vertical axis is difference between the counts in the two
replicates on the log2 scale and the hori...
Technical variation
QQ plots assuming poisson distribution in addition files.
Technical variation in general follows Poisso...
Biological variation, Figure 3
A. Plot of Mean (x) and Variance (S2)
B. Local estimates of φ and per group mean count
Figu...
Goodness of fit
QQ plots
1. Standard Poisson
2. NB with global estimate of φ
3. NB with per-gene estimate of φ
4. NB with l...
Figure: 4
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Experimental variation
Potential sources of experimental variation examined (When
experimental factors were included in th...
Figure: 5
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Flow-cell, the entire observed counts were smaller than the
expected count.
Reason was the software upgrade mid-way throug...
Figure: 6
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Characterizing genes with poor model fit
Effect of genes with small counts.
1. smallest GOF statistics: indicative of overfit...
Figure: 7
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Data records of genes with very small GOF statistics.
1. All 0 counts in one response group and non zero counts in
other
2...
Upcoming SlideShare
Loading in …5
×

Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

1,060 views
962 views

Published on

Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,060
On SlideShare
0
From Embeds
0
Number of Embeds
88
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

  1. 1. Technical and biological variance structure in mRNA-Seq data:life in the real world Paper by Ann Oberg, et al. October 2, 2013 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  2. 2. Concept Suppose x is helpful in predicting y. y = β0 + β1x + (1) ∼ N(0, σ2 ) No variation, no model ◦ C = (◦ F − 32) × 5 9 (2) Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  3. 3. Concept RNASeq studies, sources of variation Technical variation: flowcell, replication in lanes, library preparation etc Biological variation: person to person Observed count data: combination of both types of variation. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  4. 4. Concept Technical variation Poisson distribution: Var(Y ) = µ Total variation over-dispersion: Var(Y ) > µ within sample variation ∼ Poisson distribution between sample variation ∼ Gamma distribution This gives rise to Negative Binomial distribution Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  5. 5. Purpose of the paper Describe the mean variance relationship in mRNA Seq data 1. Var(Y ) = µ: Poisson 2. Var(Y ) = kµ: Overdispersed Poisson (OD) 3. Var(Y ) = µ + φµ2: Negative-Binomial distribution Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  6. 6. Purpose of the paper Estimation of φ is very crucial step 1. per gene, glm.nb function MASS 2. local, empirical Bayes estimate shrinking per gene estimate towards global, edgeR 3. global, quantile adjusted conditional maximum likelihood, edgeR Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  7. 7. Data and Statistical Experimental Design, Figure 1 25 study subjects (all female caucasians): 12 high and 13 low antibody responders 13 flow cells, each with 8 lanes: 4 for High response, 4 for Low response For each response group, two specimens: unstimulated and stimulated 2 replicates for unstimulated and stimulated specimens each 2 subjects failed from High response group; leaving 10 subjects high and 13 subjects low Only the unstimulated specimens were used, to avoid correlation Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  8. 8. Figure: 2 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  9. 9. Statistical Analysis Models were fit to unstimulated specimens only to focus on biological variation Counts for the two technical replicates were summed for the models. No normalization with total count per lane-pair OR 75th percentile count per lane pair as normalization constant. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  10. 10. Technical variation Representative scatter plot of technical replicate 1 versus technical replicate 2 for one subject. Spearman correlation was 0.9941 for this pair. Figure: Supplementary plot Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  11. 11. Technical variation The vertical axis is difference between the counts in the two replicates on the log2 scale and the horizontal axis is the average of the two counts on the log2 scale. Figure: Bland Altman plot: Supplementary plot Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  12. 12. Technical variation QQ plots assuming poisson distribution in addition files. Technical variation in general follows Poisson distribution. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  13. 13. Biological variation, Figure 3 A. Plot of Mean (x) and Variance (S2) B. Local estimates of φ and per group mean count Figure: 3 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  14. 14. Goodness of fit QQ plots 1. Standard Poisson 2. NB with global estimate of φ 3. NB with per-gene estimate of φ 4. NB with local estimate of φ Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  15. 15. Figure: 4 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  16. 16. Experimental variation Potential sources of experimental variation examined (When experimental factors were included in the model): flow-cell, lane-pair and library preparation batch Figure 5 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  17. 17. Figure: 5 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  18. 18. Flow-cell, the entire observed counts were smaller than the expected count. Reason was the software upgrade mid-way through the experiment. Number of read increased with the software upgrade, Figure 6A. After 75th percentile offset was used, no clear flow-cell effect. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  19. 19. Figure: 6 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  20. 20. Characterizing genes with poor model fit Effect of genes with small counts. 1. smallest GOF statistics: indicative of overfitting 2. largest GOF statistics: indicative of underfitting (not explaining enough variance) Filtering out up to 10,000 total count had minor impact GOF statistics for gene with average gene count < 5 per subject were distributed through out the range.Figure 7A Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  21. 21. Figure: 7 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  22. 22. Data records of genes with very small GOF statistics. 1. All 0 counts in one response group and non zero counts in other 2. counts very consistent and small variance Data records of genes with very large GOF statistics. 1. The variance is very high. Example of one such gene in Figure 7b Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world

×