RNASeq DE methods review Applied Bioinformatics Journal Club

Applied Bioinformatics Journal Club
Wednesday, March 5

Background
• Comparison of
commonly used DE
software packages
–
–
–
–
–
–

Cuffdiff
edgeR
DESeq
PoisssonSeq
baySeq
limma

• Two benchmark
datasets
– Sequencing Quality
Control (SEQC) dataset
• Includes qRT-PCR for
1,000 genes

– Biological replicates from
3 cell lines as part of
ENCODE project

Focus of paper:
Comparison of elevant measures for
DE detection

• Normalization of count data

• Sensitivity and specificity of DE detection
• Genes expressed in one condition but no
expression in the other condition
• Sequencing depth and number of replicates

Theoretical background
• Count matrix—number
of reads assigned to
gene i in sequencing
experiment j
• Length bias when
measuring gene
expression by RNA-seq
– Reduces the ability to
detect differential
expression among
shorter genes

• Differential gene
expression consists of 3
components:
– Normalization of counts
– Parameter estimation of
the statistical model
– Tests for differential
expression

Normalization
• Commonly used
– RPKM
– FPKM
– Biases—proportional
representation of each
gene is dependent on
expression levels of other
genes

• DESeq-scaling factor
based normalization
– median of ratio for each
gene of its read count over
its geometric mean across
all samples

• Cuffdiff—extension of
DESeq normalization
– Intra-condition library
scaling
– Second scaling between
conditions
– Also accounts for changes
in isoform levels

Normalization
• edgeR
– Trimmed means of M
values (TMM)
– Weighted average of
subset of genes
(excluding genes of high
average read counts and
genes with large
differences in
expression)

• baySeq
– Sum gene counts to
upper 25% quantile to
normalize library size

• PoissonSeq
– Goodness of fit estimate
to define a gene set that
is least differentiated
between 2 conditions,
and then used to
compute library
normalization factors

Normalization
• limma (2 normalization procedures)
– Quantile normalization
Sorts counts from each sample and sets the
values to be equal to quantile mean from all
samples
– Voom: LOWESS regression to estimate mean
variance relation and transforms read counts to
log form for linear modeling

Statistical modeling of gene expression
• edgeR and DESeq
– Negative binomial distribution (estimation of
dispersion factor)

• edgeR
– Estimation of dispersion factor as weighted
combination of 2 components
• Gene specific dispersion effect and common dispersion
effect calculated for all genes

• DESeq
– Variance estimate into a combination of Poisson
estimate and a second term that models biological
variability

• Cuffdiff
– Separate variance models for single isoform and
multiple isoform genes
• Single isoform—similar to DESeq
• Multiple isoform– mixed model of negative binomial
and beta distributions

• baySeq
– Full Bayesian model of negative binomial
distributions
– Prior probability parameters are estimated by
numerical sampling of the data

• PoissonSeq
– Models gene counts as a Poisson variable
– Mean of distribution represented by log-linear
relationship of library size, expression of gene, and
correlation of gene with condition

Test for differential expression
• edgeR and DESeq
– Variation of Fisher exact test modified for negative
binomial distribution
– Returns exact P value from derived probabilities

• Cuffdiff
– Ratio of normalized counts between 2 conditions
(follows normal distribution)
– t-test to calculate P value

• limma
– Moderated t-statistic of modified standard error
and degrees of freedom

• baySeq
– Estimates 2 models for every gene
• No differential expression
• Differential expression

– Posterior likelihood of DE given the data is used to
identify differentially expressed genes (no P value)

• PoissonSeq
– Test for significance of correlation term
– Evaluated by score statistics which follow a Chisquared distribution (used to derive P values)

• Multiple hypothesis corrections
– Benjamini-Hochberg
– PoissonSeq—permutation based FDR

Results
• Normalization and log expression correlation
• Differential expression analysis

• Evaluation of type I errors
• Evaluation of genes expressed in one condition
• Impact of sequencing depth and replication on
DE detection

RNASeq DE methods review Applied Bioinformatics Journal Club

RNASeq DE methods review Applied Bioinformatics Journal Club

More Related Content

What's hot

Similar to RNASeq DE methods review Applied Bioinformatics Journal Club

More from Jennifer Shelton

Recently uploaded

RNASeq DE methods review Applied Bioinformatics Journal Club