Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Comparison between RNASeq and Micro... by Yaoyu Wang 6336 views
- Introduction to RNA-seq and RNA-seq... by VHIR Vall d’Hebro... 13414 views
- RNA-seq differential expression ana... by mikaelhuss 13974 views
- RNA-seq Analysis by COST action BM1006 24809 views
- An introduction to RNA-seq data ana... by AGRF_Ltd 15830 views
- Ngs ppt by Archa Dave 38376 views

15,336 views

Published on

Sonia Tarazona -

Massive sequencing data analysis workshop -

Granada 2011

Published in:
Technology

No Downloads

Total views

15,336

On SlideShare

0

From Embeds

0

Number of Embeds

415

Shares

0

Downloads

642

Comments

0

Likes

12

No embeds

No notes for slide

- 1. Introduction Normalization Diﬀerential Expression Diﬀerential Expression in RNA-Seq Sonia Tarazona starazona@cipf.esData analysis workshop for massive sequencing data Granada, June 2011 Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 2. Introduction Normalization Diﬀerential ExpressionOutline 1 Introduction 2 Normalization 3 Diﬀerential Expression Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 3. Introduction Some questions Normalization Some deﬁnitions Diﬀerential Expression RNA-seq expression dataOutline 1 Introduction Some questions Some deﬁnitions RNA-seq expression data 2 Normalization 3 Diﬀerential Expression Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 4. Introduction Some questions Normalization Some deﬁnitions Diﬀerential Expression RNA-seq expression dataSome questions How do we measure expression? What is diﬀerential expression? Experimental design in RNA-Seq Can we used the same statistics as in microarrays? Do I need any “normalization”? Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 5. Introduction Some questions Normalization Some deﬁnitions Diﬀerential Expression RNA-seq expression dataSome deﬁnitions Expression level RNA-Seq: The number of reads (counts) mapping to the biological feature of interest (gene, transcript, exon, etc.) is considered to be linearly related to the abundance of the target feature. Microarrays: The abundance of each sequence is a function of the ﬂuorescence level recovered after the hybridization process. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 6. Introduction Some questions Normalization Some deﬁnitions Diﬀerential Expression RNA-seq expression dataSome deﬁnitions Sequencing depth: Total number of reads mapped to the genome. Library size. Gene length: Number of bases. Gene counts: Number of reads mapping to that gene (expression measurement). Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 7. Introduction Some questions Normalization Some deﬁnitions Diﬀerential Expression RNA-seq expression dataSome deﬁnitions What is diﬀerential expression? A gene is declared diﬀerentially expressed if an observed diﬀerence or change in read counts between two experimental conditions is statistically signiﬁcant, i.e. whether it is greater than what would be expected just due to natural random variation. Statistical tools are needed to make such a decision by studying counts probability distributions. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 8. Introduction Some questions Normalization Some deﬁnitions Diﬀerential Expression RNA-seq expression dataRNA-Seq expression data Experimental design Pairwise comparisons: Only two experimental conditions or groups are to be compared. Multiple comparisons: More than two conditions or groups. Replication Biological replicates. To draw general conclusions: from samples to population. Technical replicates. Conclusions are only valid for compared samples. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 9. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesOutline 1 Introduction 2 Normalization Why? Methods Examples 3 Diﬀerential Expression Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 10. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesWhy Normalization? RNA-seq biases Inﬂuence of sequencing depth: The higher sequencing depth, the higher counts. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 11. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesWhy Normalization? RNA-seq biases Dependence on gene length: Counts are proportional to the transcript length times the mRNA expression level. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 12. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesWhy Normalization? RNA-seq biases Diﬀerences on the counts distribution among samples. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 13. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesWhy Normalization? RNA-seq biases Inﬂuence of sequencing depth: The higher sequencing depth, the higher counts. Dependence on gene length: Counts are proportional to the transcript length times the mRNA expression level. Diﬀerences on the counts distribution among samples. Options 1 Normalization: Counts should be previously corrected in order to minimize these biases. 2 Statistical model should take them into account. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 14. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesSome Normalization Methods RPKM (Mortazavi et al., 2008): Counts are divided by the transcript length (kb) times the total number of millions of mapped reads. number of reads of the region RPKM = total reads × region length 1000000 1000 Upper-quartile (Bullard et al., 2010): Counts are divided by upper-quartile of counts for transcripts with at least one read. TMM (Robinson and Oshlack, 2010): Trimmed Mean of M values. Quantiles, as in microarray normalization (Irizarry et al., 2003). FPKM (Trapnell et al., 2010): Instead of counts, Cuﬄinks software generates FPKM values (Fragments Per Kilobase of exon per Million fragments mapped) to estimate gene expression, which are analogous to RPKM. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 15. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesExample on RPKM normalization Sequencing depth Marioni et al., 2008 Kidney: 9.293.530 reads Liver: 8.361.601 reads RPKM normalization Length: 1500 bases. 620 Kidney: RPKM = 9293530 × 1500 = 44,48 106 1000 746 Liver: RPKM = 8361601 × 1500 = 59,48 106 1000 Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 16. Introduction Why? Normalization Methods Diﬀerential Expression ExamplesExample Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 17. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingOutline 1 Introduction 2 Normalization 3 Diﬀerential Expression Methods NOISeq Exercises Concluding Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 18. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingDiﬀerential Expression Parametric approaches Counts are modeled using known probability distributions such as Binomial, Poisson, Negative Binomial, etc. R packages in Bioconductor: edgeR (Robinson et al., 2010): Exact test based on Negative Binomial distribution. DESeq (Anders and Huber, 2010): Exact test based on Negative Binomial distribution. DEGseq (Wang et al., 2010): MA-plots based methods (MATR and MARS), assuming Normal distribution for M|A. baySeq (Hardcastle et al., 2010): Estimation of the posterior likelihood of diﬀerential expression (or more complex hypotheses) via empirical Bayesian methods using Poisson or NB distributions. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 19. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingDiﬀerential Expression Non-parametric approaches No assumptions about data distribution are made. Fisher’s exact test (better with normalized counts). cuﬀdiﬀ (Trapnell et al., 2010): Based on entropy divergence for relative transcript abundances. Divergence is a measurement of the ”distance”between the relative abundances of transcripts in two diﬀerence conditions. NOISeq (Tarazona et al., coming soon) −→ http://bioinfo.cipf.es/noiseq/ Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 20. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingDiﬀerential Expression Drawbacks of diﬀerential expression methods Parametric assumptions: Are they fulﬁlled? Need of replicates. Problems to detect diﬀerential expression in genes with low counts. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 21. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingDiﬀerential Expression Drawbacks of diﬀerential expression methods Parametric assumptions: Are they fulﬁlled? Need of replicates. Problems to detect diﬀerential expression in genes with low counts. NOISeq Non-parametric method. No need of replicates. Less inﬂuenced by sequencing depth or number of counts. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 22. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq Outline Signal distribution. Computing changes in expression of each gene between the two experimental conditions. Noise distribution. Distribution of changes in expression values when comparing replicates within the same condition. Diﬀerential expression. Comparing signal and noise distributions to determine diﬀerentially expressed genes. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 23. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq NOISeq-real Replicates are available for each condition. Compute M-D in noise by comparing each pair of replicates within the same condition. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 24. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq NOISeq-real Replicates are available for each condition. Compute M-D in noise by comparing each pair of replicates within the same condition. NOISeq-sim No replicates are available at all. NOISeq simulates technical replicates for each condition. The replicates are generated from a multinomial distribution taking the counts in the only sample as the probabilities for the distribution. Compute M-D in noise by comparing each pair of simulated replicates within the same condition. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 25. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq: Signal distribution 1 Calculate the expression of each gene at each experimental condition (expressioni , for i = 1, 2). Technical replicates: expressioni = sum(valuesi ) Biological replicates: expressioni = mean(valuesi ) No replicates: expressioni = valuei 2 Compute for each gene the statistics measuring changes in expression: expression M = log2 expression1 2 D = |expression1 –expression2 | Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 26. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq: Noise distribution With replicates (NOISeq-real): For each condition, all the possible comparisons among the available replicates are used to compute M and D values. All the M-D values for all the comparisons, genes and for both conditions are pooled together to create the noise distribution. If the number of pairwise comparisons for a certain condition is higher than 30, only 30 randomly selected comparisons are made. Without replicates (NOISeq-sim): Replicates are simulated and the same procedure is used to derive noise distribution. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 27. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq: Diﬀerential expression Probability for each gene of being diﬀerentially expressed: It is obtained by comparing M-D values of that gene against noise distribution and computing the number of cases in which the values in noise are lower than the values for signal. A gene is declared as diﬀerentially expressed if this probability is higher than q. The threshold q is set to 0.8 by default, since this value is equivalent to an odds of 4:1 (the gene in 4 times more likely to be diﬀerentially expressed than not). Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 28. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq Input Data: datos1, datos2 Features length: long (only if length correction is to be applied) Normalization: norm = {“rpkm”, “uqua”, “tmm”, “none”}; lc = “length correction” Replicates: repl = {“tech”, “bio”} Simulation: nss = “number of replicates to be simulated”; pnr = “total counts in each simulated replicate”; v = variability for pnr Probability cutoﬀ: q (≥ 0,8) Others: k = 0,5 Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 29. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingNOISeq Output Diﬀerential expression probability. For each feature, probability of being diﬀerentially expressed. Diﬀerentially expressed features. List of features names which are diﬀerentially expressed according to q cutoﬀ. M-D values. For signal (between conditions and for each feature) and for noise (among replicates within the same condition, pooled). Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 30. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingExercises Execute in an R terminal the code provided in exerciseDEngs.r Reading data simCount <- readData(file = "simCount.txt", cond1 = c(2:6), cond2 = c(7:11), header = TRUE) lapply(simCount, head) depth <- as.numeric(sapply(simCount, colSums)) Diﬀerential Expression by NOISeq NOISeq-real res1noiseq <- noiseq(simCount[[1]], simCount[[2]], nss = 0, q = 0.8, repl = ‘‘tech’’) NOISeq-sim res2noiseq <- noiseq(as.matrix(rowSums(simCount[[1]])), as.matrix(rowSums(simCount[[2]])), q = 0.8, pnr = 0.2, nss = 5) See the tutorial in http://bioinfo.cipf.es/noiseq/ for more information. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 31. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingSome remarks Experimental design is decisive to answer correctly your biological questions. Diﬀerential expression methods for RNA-Seq data must be diﬀerent to microarray methods. Normalization should be applied to raw counts, at least a library size correction. Sonia Tarazona Diﬀerential Expression in RNA-Seq
- 32. Methods Introduction NOISeq Normalization Exercises Diﬀerential Expression ConcludingFor Further Reading Oshlack, A., Robinson, M., and Young, M. (2010) From RNA-seq reads to diﬀerential expression results. Genome Biology , 11, 220+. Review on RNA-Seq, including diﬀerential expression. Auer, P.L., and Doerge R.W. (2010) Statistical Design and Analysis of RNA Sequencing Data. Genetics, 185, 405-416. Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, S. (2010) Evaluation of statistical methods for normalization and diﬀerential expression in mRNA-Seq experiments. BMC Bioinformatics, 11, 94+. Normalization methods (including Upper Quartile) and diﬀerential expression. Tarazona, S., Garc´ ıa-Alcalde, F., Dopazo, J., Ferrer A., and Conesa, A. (submitted) Diﬀerential expression in RNA-seq: a matter of depth. Sonia Tarazona Diﬀerential Expression in RNA-Seq

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment