0
Applied Bioinformatics Journal Club
Wednesday, March 5
Background
• Comparison of
commonly used DE
software packages
–
–
–
–
–
–

Cuffdiff
edgeR
DESeq
PoisssonSeq
baySeq
limma

...
Focus of paper:
Comparison of elevant measures for
DE detection

• Normalization of count data

• Sensitivity and specific...
Theoretical background
• Count matrix—number
of reads assigned to
gene i in sequencing
experiment j
• Length bias when
mea...
Normalization
• Commonly used
– RPKM
– FPKM
– Biases—proportional
representation of each
gene is dependent on
expression l...
Normalization
• edgeR
– Trimmed means of M
values (TMM)
– Weighted average of
subset of genes
(excluding genes of high
ave...
Normalization
• limma (2 normalization procedures)
– Quantile normalization
Sorts counts from each sample and sets the
val...
Statistical modeling of gene expression
• edgeR and DESeq
– Negative binomial distribution (estimation of
dispersion facto...
Statistical modeling of gene expression
• DESeq
– Variance estimate into a combination of Poisson
estimate and a second te...
Statistical modeling of gene expression
• baySeq
– Full Bayesian model of negative binomial
distributions
– Prior probabil...
Test for differential expression
• edgeR and DESeq
– Variation of Fisher exact test modified for negative
binomial distrib...
Test for differential expression
• limma
– Moderated t-statistic of modified standard error
and degrees of freedom

• bayS...
Test for differential expression
• PoissonSeq
– Test for significance of correlation term
– Evaluated by score statistics ...
Results
• Normalization and log expression correlation
• Differential expression analysis

• Evaluation of type I errors
•...
5
5
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
Upcoming SlideShare
Loading in...5
×

RNASeq DE methods review Applied Bioinformatics Journal Club

527

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
527
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "RNASeq DE methods review Applied Bioinformatics Journal Club"

  1. 1. Applied Bioinformatics Journal Club Wednesday, March 5
  2. 2. Background • Comparison of commonly used DE software packages – – – – – – Cuffdiff edgeR DESeq PoisssonSeq baySeq limma • Two benchmark datasets – Sequencing Quality Control (SEQC) dataset • Includes qRT-PCR for 1,000 genes – Biological replicates from 3 cell lines as part of ENCODE project
  3. 3. Focus of paper: Comparison of elevant measures for DE detection • Normalization of count data • Sensitivity and specificity of DE detection • Genes expressed in one condition but no expression in the other condition • Sequencing depth and number of replicates
  4. 4. Theoretical background • Count matrix—number of reads assigned to gene i in sequencing experiment j • Length bias when measuring gene expression by RNA-seq – Reduces the ability to detect differential expression among shorter genes • Differential gene expression consists of 3 components: – Normalization of counts – Parameter estimation of the statistical model – Tests for differential expression
  5. 5. Normalization • Commonly used – RPKM – FPKM – Biases—proportional representation of each gene is dependent on expression levels of other genes • DESeq-scaling factor based normalization – median of ratio for each gene of its read count over its geometric mean across all samples • Cuffdiff—extension of DESeq normalization – Intra-condition library scaling – Second scaling between conditions – Also accounts for changes in isoform levels
  6. 6. Normalization • edgeR – Trimmed means of M values (TMM) – Weighted average of subset of genes (excluding genes of high average read counts and genes with large differences in expression) • baySeq – Sum gene counts to upper 25% quantile to normalize library size • PoissonSeq – Goodness of fit estimate to define a gene set that is least differentiated between 2 conditions, and then used to compute library normalization factors
  7. 7. Normalization • limma (2 normalization procedures) – Quantile normalization Sorts counts from each sample and sets the values to be equal to quantile mean from all samples – Voom: LOWESS regression to estimate mean variance relation and transforms read counts to log form for linear modeling
  8. 8. Statistical modeling of gene expression • edgeR and DESeq – Negative binomial distribution (estimation of dispersion factor) • edgeR – Estimation of dispersion factor as weighted combination of 2 components • Gene specific dispersion effect and common dispersion effect calculated for all genes
  9. 9. Statistical modeling of gene expression • DESeq – Variance estimate into a combination of Poisson estimate and a second term that models biological variability • Cuffdiff – Separate variance models for single isoform and multiple isoform genes • Single isoform—similar to DESeq • Multiple isoform– mixed model of negative binomial and beta distributions
  10. 10. Statistical modeling of gene expression • baySeq – Full Bayesian model of negative binomial distributions – Prior probability parameters are estimated by numerical sampling of the data • PoissonSeq – Models gene counts as a Poisson variable – Mean of distribution represented by log-linear relationship of library size, expression of gene, and correlation of gene with condition
  11. 11. Test for differential expression • edgeR and DESeq – Variation of Fisher exact test modified for negative binomial distribution – Returns exact P value from derived probabilities • Cuffdiff – Ratio of normalized counts between 2 conditions (follows normal distribution) – t-test to calculate P value
  12. 12. Test for differential expression • limma – Moderated t-statistic of modified standard error and degrees of freedom • baySeq – Estimates 2 models for every gene • No differential expression • Differential expression – Posterior likelihood of DE given the data is used to identify differentially expressed genes (no P value)
  13. 13. Test for differential expression • PoissonSeq – Test for significance of correlation term – Evaluated by score statistics which follow a Chisquared distribution (used to derive P values) • Multiple hypothesis corrections – Benjamini-Hochberg – PoissonSeq—permutation based FDR
  14. 14. Results • Normalization and log expression correlation • Differential expression analysis • Evaluation of type I errors • Evaluation of genes expressed in one condition • Impact of sequencing depth and replication on DE detection
  15. 15. 5
  16. 16. 5
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×