SlideShare a Scribd company logo
1 of 13
Background - DESeq
• Modelling the number of reads sequenced from a gene X
– Can use a Binomial B(n, p), n=total number of reads, p=prob. from gene X
– Can approximate with a Poisson(np) as n large, p small
– Poisson model works ok for a gene’s variation between technical replicates
– However, Poisson understimates variation between biological replicates
– edgeR and deseq use a negative binomial instead (for gene i in sample j)
Equation (1): Kij ~ NB(mu_ij, sigma2
ij)
– Negative binomial has two parameters, mean mu and variance sigma2
– Number of replicates is usually too small to estimate both for a gene X
EdgeR
– Assumes sigma2
= mu + alpha*mu2
, where alpha is the same for all genes
– Just needs to estimate mu for a gene, then calculate sigma2
from that
• DESeq
– For each sample, makes a local regression of sigma2
versus mu
– Given mu for gene X, uses the local regression to estimate sigma2
Results & Discussion
• DESeq’s model - makes three assumptions
– Equation (2): mu_ij = qi,rho(j) * sj
mu_ij = expected value of mean count (no. reads) for gene i in sample j
qi,rho(j) = proportional to concentration of fragments from gene i in sample j
sj = coverage (sampling depth) of library j
– Equation (3): sigma2
_ij = mu_ij + sj2
* vi,rho(j)
sigma2_ij = variance of no. reads for gene i in sample j
mu_ij = variance due to Poisson model (technical variation) = “shot noise”
sj2
* vi,rho(j) = variance due to biological variation(?) = “raw variance”
– Equation (4): vi,rho(j) = vrho ( qi,rho(j) )
ie. vi,rho(j) is a function of qi,rho(j)
So we can make a regression of vi,rho(j) against qi,rho(j) for lots of genes (i)
Then estimate vi,rho(j) for gene X, based on qi,rho(j) and the regression line
• DESeq’s model – estimating parameters
– sj : coverage (sampling depth) of library j
The total number of reads in library j is not a good measure of depth.
Instead, take the median (over all genes) of the ratios of observed counts:
Equation (5): sj = median_over_i ( kij / [ Sum_over_v kiv ]^(1/m) ] )
– qi,rho(j) = “expression strength” parameter for gene i in condition rho
Proportional to concentration of fragments from gene i in sample j.
Use the average of countsfrom samples j for condition rho:
Equation (6): qi,rho = 1/m_rho * Sum_over_j (kij / sj)
– vrho = function describing how vi,rho(j) depends on qi,rho(j)
Estimate the sample variance for each gene i, wi(rho) (Equation 7)
Fit a local regression line to wi(rho) versus qi(rho)
For a particular qi(rho) value, predict w=wi(rho) from the regression line
Also calculate zi(rho) for gene i (Equation 8)
Then use v = w – zi(rho) as an unbiased estimate of the variance vi,rho for
gene i (Equation 9)
• DESeq’s model – testing for differential expression
– Null hypothesis: qiA = qiB
qiA = expression strength parameter for gene i in the samples of condition A,
mA = number of samples for condition A
– Test statistic: total counts in each condition
Equation (10): KiA = counts in condition A = Sum_over_A ( Kij)
– P-value for test of null hypothesis
Under the null hypothesis, can compute prob(KiA = a, KiB = b) = p(a,b)
Equation (11): P-value for observed count (kiA, kiB) =
Sum of probabilities p(a,b) where p(a,b)≤ p(kiA,kiB), a+b = kiA+kiB
Sum of probabilities p(a,b) where a+b = kiA+kiB
– Computing p(a,b) values
p(a,b) = Prob(KiA = a) * Prob(KiB = b), assuming samples are independent
KiA is the sum of mA NB-distributed variables
We approximate its distribution by a NB(mu, sigma) distribution
whose parameters mu, sigma are estimated using Equations 12,13,14
Applications
• Variance estimation
– Use RNA-seq data from fly embryos: ‘A’ and ‘B’ samples, 2 replicates each
Figure 1: estimated variances wi(rho) plotted against qi(rho) for fly sample A
Distance between orange and purple lines is noise due to biological sampling
regression
edgeR
“shot noise”
(technical
variation)
• Testing for differential expression
– Compared the 2 replicates for fly sample A
Figure 2: the empirical cumulative distribution functions of the P-values
The ECDF curve (blue line) should be below the diagonal (gray line)
Type I error is controlled by EdgeR & DESeq, but not a Poisson-based test
EdgeR has an excess of small P-values for low counts, but is more
conservative for high counts
DESeq
edgeR
Poisson
Low High All
• Testing for differential expression
– Compared fly A & B samples
Figure 3: obtained fold changes and P-values
The ability to detect differential expression depends on overall counts
The strong shot noise (technical variation) for low counts causes the testing
procedure to call only very high fold changes as significant
Red: significant p-value
• Comparison with EdgeR
– Ran edgeR with 4 settings:
(i) “Common-dispersion” or “tagwise-dispersion” modes for estimating variance
(ii) Size factors estimated by DESeq, or total number of reads
Results were very similar for the 4 settings
EdgeR’s single-value dispersion estimate of variance is lower than DESeq for
weakly expressed genes & higher for strongly expressed genes (Figure 1)
regression
edgeR
“shot noise”
(technical
variation)
As a result, EdgeR is anti-conservative for
lowly expressed genes, but more
conservative for strongly expressed genes
This biases the list of discoveries by EdgeR
Figure 4 shows that weakly expressed genes seem to be over-represented
Few genes with high average level are called differentially expressed by EdgeR
DESeq produced results which were more balanced over the dynamic range
All fly data
DESeq hits
EdgeR hits
• Working without replicates
– DESeq can work if there are no replicates in one or both conditions
If there are just replicates from one condition, fit regression line using that one
If there are no replicates, treat the samples as replicates to fit the regression
For neural cell data, variability between replicates ≈ variability bet. conditions
However, for fly data, variability between replicates << variability bet. conditions
• Variance-stabilising transformation (VST)
– Given a variance-mean regression, a VST transforms the values so the
variance is independent of the mean (Equation 15)
This yields (transformed) count values whose variances are approximately
the same throughout the dynamic range
This is useful for sample clustering, since clustering assumes all genes have
roughly the same variance
Figure 5 shows clustering for neural cell samples, using VST-transformed data
• ChIP-Seq data
– Compared HapMap IDs GM12878 and GM12891
DESeq does not give false positives when comparing replicates for 1 individual
Using a Poisson-based model, you would get many false positives
DESeq
Poisson
Same individual Different individuals
Summary
• A Poisson model underestimates the variance between
biological samples; this leads to false positives in differential
expression analyses
• A Negative Binomial distribution is much better
• This is especially true for highly expressed genes
• DESeq and EdgeR use the Negative Binomial
• However, DESeq estimates the sequencing depth differently
• Also DESeq estimates the variance for a gene by assuming
it has similar variance to genes of similiar expression level
• DESeq and EdgeR have similar sensitivity, but EdgeR calls a
greater number of weakly expressed genes as significant,
and fewer highly expressed genes as significant

More Related Content

Similar to DESeq Paper Journal club

DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
DivyanshGupta922023
 
IGARSS_2011.pptx
IGARSS_2011.pptxIGARSS_2011.pptx
IGARSS_2011.pptx
grssieee
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 
IGARSSWellLog_Vancouver_07_29.pptx
IGARSSWellLog_Vancouver_07_29.pptxIGARSSWellLog_Vancouver_07_29.pptx
IGARSSWellLog_Vancouver_07_29.pptx
grssieee
 

Similar to DESeq Paper Journal club (20)

Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
IGARSS_2011.pptx
IGARSS_2011.pptxIGARSS_2011.pptx
IGARSS_2011.pptx
 
Neural Networks with Complex Sample Data
Neural Networks with Complex Sample DataNeural Networks with Complex Sample Data
Neural Networks with Complex Sample Data
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Makalah ukuran penyebaran
Makalah ukuran penyebaranMakalah ukuran penyebaran
Makalah ukuran penyebaran
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
2.0.statistical methods and determination of sample size
2.0.statistical methods and determination of sample size2.0.statistical methods and determination of sample size
2.0.statistical methods and determination of sample size
 
Igor Segota: PhD thesis presentation
Igor Segota: PhD thesis presentationIgor Segota: PhD thesis presentation
Igor Segota: PhD thesis presentation
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Chapter4_Multi_Reg_Estim.pdf.pdf
Chapter4_Multi_Reg_Estim.pdf.pdfChapter4_Multi_Reg_Estim.pdf.pdf
Chapter4_Multi_Reg_Estim.pdf.pdf
 
IGARSSWellLog_Vancouver_07_29.pptx
IGARSSWellLog_Vancouver_07_29.pptxIGARSSWellLog_Vancouver_07_29.pptx
IGARSSWellLog_Vancouver_07_29.pptx
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Integrated modelling Cape Town
Integrated modelling Cape TownIntegrated modelling Cape Town
Integrated modelling Cape Town
 
Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 

More from avrilcoghlan

Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
avrilcoghlan
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
avrilcoghlan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
avrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
avrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
avrilcoghlan
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
avrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
avrilcoghlan
 

More from avrilcoghlan (10)

Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Homology
HomologyHomology
Homology
 
BLAST
BLASTBLAST
BLAST
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 

DESeq Paper Journal club

  • 1. Background - DESeq • Modelling the number of reads sequenced from a gene X – Can use a Binomial B(n, p), n=total number of reads, p=prob. from gene X – Can approximate with a Poisson(np) as n large, p small – Poisson model works ok for a gene’s variation between technical replicates – However, Poisson understimates variation between biological replicates – edgeR and deseq use a negative binomial instead (for gene i in sample j) Equation (1): Kij ~ NB(mu_ij, sigma2 ij) – Negative binomial has two parameters, mean mu and variance sigma2 – Number of replicates is usually too small to estimate both for a gene X EdgeR – Assumes sigma2 = mu + alpha*mu2 , where alpha is the same for all genes – Just needs to estimate mu for a gene, then calculate sigma2 from that • DESeq – For each sample, makes a local regression of sigma2 versus mu – Given mu for gene X, uses the local regression to estimate sigma2
  • 2. Results & Discussion • DESeq’s model - makes three assumptions – Equation (2): mu_ij = qi,rho(j) * sj mu_ij = expected value of mean count (no. reads) for gene i in sample j qi,rho(j) = proportional to concentration of fragments from gene i in sample j sj = coverage (sampling depth) of library j – Equation (3): sigma2 _ij = mu_ij + sj2 * vi,rho(j) sigma2_ij = variance of no. reads for gene i in sample j mu_ij = variance due to Poisson model (technical variation) = “shot noise” sj2 * vi,rho(j) = variance due to biological variation(?) = “raw variance” – Equation (4): vi,rho(j) = vrho ( qi,rho(j) ) ie. vi,rho(j) is a function of qi,rho(j) So we can make a regression of vi,rho(j) against qi,rho(j) for lots of genes (i) Then estimate vi,rho(j) for gene X, based on qi,rho(j) and the regression line
  • 3. • DESeq’s model – estimating parameters – sj : coverage (sampling depth) of library j The total number of reads in library j is not a good measure of depth. Instead, take the median (over all genes) of the ratios of observed counts: Equation (5): sj = median_over_i ( kij / [ Sum_over_v kiv ]^(1/m) ] ) – qi,rho(j) = “expression strength” parameter for gene i in condition rho Proportional to concentration of fragments from gene i in sample j. Use the average of countsfrom samples j for condition rho: Equation (6): qi,rho = 1/m_rho * Sum_over_j (kij / sj) – vrho = function describing how vi,rho(j) depends on qi,rho(j) Estimate the sample variance for each gene i, wi(rho) (Equation 7) Fit a local regression line to wi(rho) versus qi(rho) For a particular qi(rho) value, predict w=wi(rho) from the regression line Also calculate zi(rho) for gene i (Equation 8) Then use v = w – zi(rho) as an unbiased estimate of the variance vi,rho for gene i (Equation 9)
  • 4. • DESeq’s model – testing for differential expression – Null hypothesis: qiA = qiB qiA = expression strength parameter for gene i in the samples of condition A, mA = number of samples for condition A – Test statistic: total counts in each condition Equation (10): KiA = counts in condition A = Sum_over_A ( Kij) – P-value for test of null hypothesis Under the null hypothesis, can compute prob(KiA = a, KiB = b) = p(a,b) Equation (11): P-value for observed count (kiA, kiB) = Sum of probabilities p(a,b) where p(a,b)≤ p(kiA,kiB), a+b = kiA+kiB Sum of probabilities p(a,b) where a+b = kiA+kiB – Computing p(a,b) values p(a,b) = Prob(KiA = a) * Prob(KiB = b), assuming samples are independent KiA is the sum of mA NB-distributed variables We approximate its distribution by a NB(mu, sigma) distribution whose parameters mu, sigma are estimated using Equations 12,13,14
  • 5. Applications • Variance estimation – Use RNA-seq data from fly embryos: ‘A’ and ‘B’ samples, 2 replicates each Figure 1: estimated variances wi(rho) plotted against qi(rho) for fly sample A Distance between orange and purple lines is noise due to biological sampling regression edgeR “shot noise” (technical variation)
  • 6. • Testing for differential expression – Compared the 2 replicates for fly sample A Figure 2: the empirical cumulative distribution functions of the P-values The ECDF curve (blue line) should be below the diagonal (gray line) Type I error is controlled by EdgeR & DESeq, but not a Poisson-based test EdgeR has an excess of small P-values for low counts, but is more conservative for high counts DESeq edgeR Poisson Low High All
  • 7. • Testing for differential expression – Compared fly A & B samples Figure 3: obtained fold changes and P-values The ability to detect differential expression depends on overall counts The strong shot noise (technical variation) for low counts causes the testing procedure to call only very high fold changes as significant Red: significant p-value
  • 8. • Comparison with EdgeR – Ran edgeR with 4 settings: (i) “Common-dispersion” or “tagwise-dispersion” modes for estimating variance (ii) Size factors estimated by DESeq, or total number of reads Results were very similar for the 4 settings EdgeR’s single-value dispersion estimate of variance is lower than DESeq for weakly expressed genes & higher for strongly expressed genes (Figure 1) regression edgeR “shot noise” (technical variation) As a result, EdgeR is anti-conservative for lowly expressed genes, but more conservative for strongly expressed genes
  • 9. This biases the list of discoveries by EdgeR Figure 4 shows that weakly expressed genes seem to be over-represented Few genes with high average level are called differentially expressed by EdgeR DESeq produced results which were more balanced over the dynamic range All fly data DESeq hits EdgeR hits
  • 10. • Working without replicates – DESeq can work if there are no replicates in one or both conditions If there are just replicates from one condition, fit regression line using that one If there are no replicates, treat the samples as replicates to fit the regression For neural cell data, variability between replicates ≈ variability bet. conditions However, for fly data, variability between replicates << variability bet. conditions
  • 11. • Variance-stabilising transformation (VST) – Given a variance-mean regression, a VST transforms the values so the variance is independent of the mean (Equation 15) This yields (transformed) count values whose variances are approximately the same throughout the dynamic range This is useful for sample clustering, since clustering assumes all genes have roughly the same variance Figure 5 shows clustering for neural cell samples, using VST-transformed data
  • 12. • ChIP-Seq data – Compared HapMap IDs GM12878 and GM12891 DESeq does not give false positives when comparing replicates for 1 individual Using a Poisson-based model, you would get many false positives DESeq Poisson Same individual Different individuals
  • 13. Summary • A Poisson model underestimates the variance between biological samples; this leads to false positives in differential expression analyses • A Negative Binomial distribution is much better • This is especially true for highly expressed genes • DESeq and EdgeR use the Negative Binomial • However, DESeq estimates the sequencing depth differently • Also DESeq estimates the variance for a gene by assuming it has similar variance to genes of similiar expression level • DESeq and EdgeR have similar sensitivity, but EdgeR calls a greater number of weakly expressed genes as significant, and fewer highly expressed genes as significant