SlideShare a Scribd company logo
1 of 41
Download to read offline
Detecting differentially
expressed genes
RNA-seq for DE analysis training
Joachim Jacob
20 and 27 January 2014

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to
http://www.bits.vib.be/ if you use this presentation or parts hereof.
Goal
Based on a count table, we want to detect
differentially expressed genes between conditions
of interest.
We will assign to each gene a p-value (0-1), which
shows us 'how surprised we should be' to see this
difference, when we assume there is no difference.

0

p-value

1

Very big chance there is a difference
Very small chance there is a real difference
Goal
Every single decision we have taken in
previous analysis steps was done to
improve this outcome of detecting DE
expressed genes.
Goal
Algorithms under active development

http://wiki.bits.vib.be/index.php/RNAseq_toolbox#Detecting_differential_expression_by_count_analysis
Algorithms under active development

http://genomebiology.com/2010/11/10/r106
Intuition
gene_id

CAF0006876

Condition A

sample1
23171

sample2
22903

sample3
29227

sample4
24072

sample5
23151

sample6
26336

sample7
25252

sample8
24122

Condition B

Sample9
19527

sample10
26898

sample11
18880

sample12
24237

sample13
26640

sample14
22315

sample15
20952

sample16
25629

Variability X
Variability Y

Compare and conclude given a
Mean level: similar or not?
Intuition
gene_id

CAF0006876

Condition A

sample1
23171

sample2
22903

sample3
29227

sample4
24072

sample5
23151

sample6
26336

sample7
25252

sample8
24122

Condition B

Sample9
19527

sample10
26898

sample11
18880

sample12
24237

sample13
26640

sample14
22315

sample15
20952

sample16
25629
Intuition – model is fitted
gene_id

CAF0006876

Condition A

sample1
23171

sample2
22903

sample3
29227

sample4
24072

sample5
23151

sample6
26336

sample7
25252

sample8
24122

Condition B

Sample9
19527

sample10
26898

sample11
18880

sample12
24237

sample13
26640

sample14
22315

sample15
20952

sample16
25629

NB model is estimated:
2 parameters: mean and
dispersion needed.
Intuition – difference is quantified
gene_id

CAF0006876

Condition A

sample1
23171

sample2
22903

sample3
29227

sample4
24072

sample5
23151

sample6
26336

sample7
25252

sample8
24122

Condition B

Sample9
19527

sample10
26898

sample11
18880

sample12
24237

sample13
26640

sample14
22315

sample15
20952

sample16
25629

NB model is estimated:
2 parameters: mean and
dispersion needed.
Difference is put into p-value
BUT counts are dependent on
The read counts of a gene between different
conditions, is dependent on (see first part):
1. Chance (NB model)
2. Expression level
3. Library size (number of reads in that library)
4. Length of transcript
5. GC content of the genes
Normalize for library size
Assumption: most genes are not DE between
samples. DESeq calculates for every sample the
'effective library size' by a scale factor.
100%
100%

Rest of the
genes

Rest of the
genes

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
Normalize for library size
Original library size * scale factor = effective library size
DESeq will multiply original counts by the sample
scaling factor.
DESeq: This normalization method [14] is included in the DESeq Bioconductor package (version 1.6.0) [14] and is based on the hypothesis that most genes are not DE. A DESeq scaling
factor for a given lane is computed as the median of the ratio, for each gene, of its read count
over its geometric mean across all lanes. The underlying idea is that non-DE genes should have
similar read counts across samples, leading to a ratio of 1. Assuming most genes are not DE, the
median of this ratio for the lane provides an estimate of the correction factor that should be
applied to all read counts of this lane to fulfill the hypothesis
DESeq computes a scaling factor for a given sample by computing the median of the ratio, for each gene, of its
read count over its geometric mean across all samples. It then uses the assumption that most genes are not DE
and uses this median of ratios to obtain the scaling factor associated with this sample.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
Normalize for library size
Geom Mean
24595
…
…
...
…
…

1,1
0,9
1,1
1,0
1,1
0,8
…
1,2

1,2
1,0
1,2
1,2
1,0
0,85
…
0,8

0,9
1,0
1,2
0,8
0,9
1,0
0,8
0,85
1,2
0,8
1,01,1 1,2
…
…
1,1
0,9

1,2
0,85
1,2
1,0
0,85
0,9
…
0,85

0,8
1,0
0,8
1,2
1,0
1,0
…
1,1

0,85
1,1
0,85
1,1
0,85
1,2
…
1,1

1,0
1,2
1,0
0,8
1,0
1,2
…
1,0

Divide each
count by
geomean

Take median
per column
Other normalisations
●

EdgeR: TMM, trimmed mean of M-values
Dispersion estimation
●

●

For every gene, an NB is fitted based on
the counts. The most important factor in
that model to be estimated is the
dispersion.
DESeq2 applies three steps
●

Estimates dispersion parameter for each gene

●

Plots and fits a curve

●

Adjusts the dispersion parameter towards the
curve ('shrinking')
Dispersion estimation
1. Black dots: estimated
from normalized data.

2. Red line: curve fitted
3. blue dots: final assigned
dispersion parameter for
that gene

Model is fit!
Test is run between conditions
If 2 conditions are
compared, for each
gene 2 NB models (one
for each condition) are
made, and a test (Wald
test) decides whether
the difference is
significant (red in plot).
MA-plot: mean of counts
versus the log2 fold change
between 2 conditions.

Significant (p-value < 0,01)
Not significant
Test is run between conditions
If 2 conditions are
compared, for each
gene 2 NB models (one
for each condition) are
made, and a test (Wald
test) decides whether
the difference is
significant (red in plot).

This means that we are
going to perform 1000's
of tests.

If we set a cut-off on the p-value
of 0,01 and we have performed
20000 tests (= genes), 1000 genes
will appear significant by chance.
Check the distribution of p-values
An enrichment
(smaller or
Bigger) should
be seen at low
P-values.

If the histogram of the p-values does
not match a profile as shown below,
the test is not reliable. Perhaps the
NB fitting step did not succeed, or
confounding variables are present.

Other p-values should not
show a trend.
Confounded distribution of p-values
Improve test results

A fraction is
correctly identified
as DE
A fraction is
false positive

You set a cut-off of 0,05.
Improve test results
We can improve testing by 2 measures:
avoid testing: apply a filtering before testing, an
independent filtering.
●

●

apply a multiple testing correction
Independent filtering
Some scientists just
remove genes with
mean counts in the
samples <10. But there
is a more formal
method to remove
genes, in order to
reduce the testing.

http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/
From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
Independent filtering
Left a scatter plot of
mean counts versus
transformed p-values.
The red line depicts a
cut-off of 0,1. Note that
genes with lower counts
do not reach the p-value
threshold. Some of them
are save to exclude from
testing.
http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/
From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
Independent filtering

If we filter out increasingly bigger portions of genes based on their
mean counts, the number of significant genes increase.
Independent filtering

See later (slide 30)
Choose the variable of interest.
You can run it once on all to check the outcome.

http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/
From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
Independent filtering
Multiple testing correction
Automatically in results: Benjamini/Hochberg
correction, to control false discovery rate (FDR).
FDR is the fraction of false positives in the genes that
are classified as DE.
If we set a threshold α of 0,05, 20% of the DE genes
will be false positives.
Including different factors
Always remember your experimental setup and the
goal. Summarize the setup in the sample
descriptions file.
GDA (=G)

GDA + vit C (=AG)

Yeast (=WT)

Yeast mutant
(=UPC)

Day 1 Day 2

Day 1 Day 2

Additional metadata (batch
factor)
Including different factors

We provide a combination of factors
(the model, GLM) which influence the
counts. Every factor should match the
column name in the sample
descriptions

The levels of the factors corresponding
To the 'base' or 'no perturbation'.
The fraction filtered out, determined
by the independent filter tool.
Adjusted p-value cut-off
Including different factors
The 'detect differential
expression' tool gives you four
results: the first is the report
including graphs.

Only lower than
cut-off and with
indep filtering.

All genes, with indep
filtering applied.

Complete DESeq results,
without indep filtering
applied.
Standard Error (SE) of LogFC

Log2(FC)
Standard Error (SE) of LogFC

Including different factors

Log2(FC)
Including different factors
Volcano plot: shows
the DE genes with
our given cut-off.
Comparing different conditions
GDA (=G)

GDA + vit C (=AG)

Yeast (=WT)

Yeast mutant
(=UPC)

Day 1 Day 2

Day 1 Day 2

Which genes are DE between UPC and WT?
Which genes are DE between G and AG?
Which genes are DE in WT between G and AG?
Comparing different conditions
Adjust the sample descriptions file and the model:
1. Which genes are DE between UPC and WT?
2. Which genes are DE between G and AG?
3. Which genes are DE in WT between G and AG?
1.

2.

3.

Remove these

Remove these
Congratulations!
We have reached our goal!
Overview

http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html
Reads
Keywords
Effective library size
dispersion
shrinking
Significantly differentially expressed
MA-plot
Alpha cut-off
Independent filtering
FDR
p-value
Write in your own words what the terms mean
Break

More Related Content

What's hot

Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
O.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisO.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisShana White
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Forward and reverse genetics
Forward and reverse geneticsForward and reverse genetics
Forward and reverse geneticsVinod Pawar
 
Role of molecular marker
Role of molecular markerRole of molecular marker
Role of molecular markerShweta Tiwari
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 

What's hot (20)

Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Transcriptomics
TranscriptomicsTranscriptomics
Transcriptomics
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
O.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisO.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysis
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
GWAS
GWASGWAS
GWAS
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
Forward and reverse genetics
Forward and reverse geneticsForward and reverse genetics
Forward and reverse genetics
 
Role of molecular marker
Role of molecular markerRole of molecular marker
Role of molecular marker
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 

Viewers also liked

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Bits and Bytes
Bits and BytesBits and Bytes
Bits and Bytesadil raja
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchJoaquin Dopazo
 

Viewers also liked (20)

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Bits and Bytes
Bits and BytesBits and Bytes
Bits and Bytes
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss research
 
Macs course
Macs courseMacs course
Macs course
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Similar to RNA-seq for DE analysis: detecting differential expression - part 5

Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionJoachim Jacob
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray StatisticsA Roy
 
Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...DrAmitJoshi9
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysisyuvraj404
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningIdanGalShohet
 
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicClassification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicIJARIIE JOURNAL
 
Test for equal variances
Test for equal variancesTest for equal variances
Test for equal variancesJohn Smith
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Thermo Fisher Scientific
 
Genomic selection in Livestock
Genomic  selection in LivestockGenomic  selection in Livestock
Genomic selection in LivestockILRI
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 InternshipTaylor Martell
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
 
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...Joachim Jacob
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 

Similar to RNA-seq for DE analysis: detecting differential expression - part 5 (20)

Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
Explaining Peptide Prophet
Explaining Peptide ProphetExplaining Peptide Prophet
Explaining Peptide Prophet
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray Statistics
 
Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...
 
3302 3305
3302 33053302 3305
3302 3305
 
presentation
presentationpresentation
presentation
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine Learning
 
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicClassification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
 
Test for equal variances
Test for equal variancesTest for equal variances
Test for equal variances
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
 
Applied statistics part 5
Applied statistics part 5Applied statistics part 5
Applied statistics part 5
 
Genomic selection in Livestock
Genomic  selection in LivestockGenomic  selection in Livestock
Genomic selection in Livestock
 
Validaternai
ValidaternaiValidaternai
Validaternai
 
Gene Array Analyzer
Gene Array AnalyzerGene Array Analyzer
Gene Array Analyzer
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
 
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 

More from BITS

BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl courseBITS
 
Basics statistics
Basics statistics Basics statistics
Basics statistics BITS
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksBITS
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics courseBITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...BITS
 

More from BITS (15)

BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 

Recently uploaded

Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 

Recently uploaded (20)

Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 

RNA-seq for DE analysis: detecting differential expression - part 5

  • 1. Detecting differentially expressed genes RNA-seq for DE analysis training Joachim Jacob 20 and 27 January 2014 This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.
  • 2. Goal Based on a count table, we want to detect differentially expressed genes between conditions of interest. We will assign to each gene a p-value (0-1), which shows us 'how surprised we should be' to see this difference, when we assume there is no difference. 0 p-value 1 Very big chance there is a difference Very small chance there is a real difference
  • 3. Goal Every single decision we have taken in previous analysis steps was done to improve this outcome of detecting DE expressed genes.
  • 5. Algorithms under active development http://wiki.bits.vib.be/index.php/RNAseq_toolbox#Detecting_differential_expression_by_count_analysis
  • 6. Algorithms under active development http://genomebiology.com/2010/11/10/r106
  • 9. Intuition – model is fitted gene_id CAF0006876 Condition A sample1 23171 sample2 22903 sample3 29227 sample4 24072 sample5 23151 sample6 26336 sample7 25252 sample8 24122 Condition B Sample9 19527 sample10 26898 sample11 18880 sample12 24237 sample13 26640 sample14 22315 sample15 20952 sample16 25629 NB model is estimated: 2 parameters: mean and dispersion needed.
  • 10. Intuition – difference is quantified gene_id CAF0006876 Condition A sample1 23171 sample2 22903 sample3 29227 sample4 24072 sample5 23151 sample6 26336 sample7 25252 sample8 24122 Condition B Sample9 19527 sample10 26898 sample11 18880 sample12 24237 sample13 26640 sample14 22315 sample15 20952 sample16 25629 NB model is estimated: 2 parameters: mean and dispersion needed. Difference is put into p-value
  • 11. BUT counts are dependent on The read counts of a gene between different conditions, is dependent on (see first part): 1. Chance (NB model) 2. Expression level 3. Library size (number of reads in that library) 4. Length of transcript 5. GC content of the genes
  • 12. Normalize for library size Assumption: most genes are not DE between samples. DESeq calculates for every sample the 'effective library size' by a scale factor. 100% 100% Rest of the genes Rest of the genes http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
  • 13. Normalize for library size Original library size * scale factor = effective library size DESeq will multiply original counts by the sample scaling factor. DESeq: This normalization method [14] is included in the DESeq Bioconductor package (version 1.6.0) [14] and is based on the hypothesis that most genes are not DE. A DESeq scaling factor for a given lane is computed as the median of the ratio, for each gene, of its read count over its geometric mean across all lanes. The underlying idea is that non-DE genes should have similar read counts across samples, leading to a ratio of 1. Assuming most genes are not DE, the median of this ratio for the lane provides an estimate of the correction factor that should be applied to all read counts of this lane to fulfill the hypothesis DESeq computes a scaling factor for a given sample by computing the median of the ratio, for each gene, of its read count over its geometric mean across all samples. It then uses the assumption that most genes are not DE and uses this median of ratios to obtain the scaling factor associated with this sample. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
  • 14. Normalize for library size Geom Mean 24595 … … ... … … 1,1 0,9 1,1 1,0 1,1 0,8 … 1,2 1,2 1,0 1,2 1,2 1,0 0,85 … 0,8 0,9 1,0 1,2 0,8 0,9 1,0 0,8 0,85 1,2 0,8 1,01,1 1,2 … … 1,1 0,9 1,2 0,85 1,2 1,0 0,85 0,9 … 0,85 0,8 1,0 0,8 1,2 1,0 1,0 … 1,1 0,85 1,1 0,85 1,1 0,85 1,2 … 1,1 1,0 1,2 1,0 0,8 1,0 1,2 … 1,0 Divide each count by geomean Take median per column
  • 15. Other normalisations ● EdgeR: TMM, trimmed mean of M-values
  • 16. Dispersion estimation ● ● For every gene, an NB is fitted based on the counts. The most important factor in that model to be estimated is the dispersion. DESeq2 applies three steps ● Estimates dispersion parameter for each gene ● Plots and fits a curve ● Adjusts the dispersion parameter towards the curve ('shrinking')
  • 17. Dispersion estimation 1. Black dots: estimated from normalized data. 2. Red line: curve fitted 3. blue dots: final assigned dispersion parameter for that gene Model is fit!
  • 18. Test is run between conditions If 2 conditions are compared, for each gene 2 NB models (one for each condition) are made, and a test (Wald test) decides whether the difference is significant (red in plot). MA-plot: mean of counts versus the log2 fold change between 2 conditions. Significant (p-value < 0,01) Not significant
  • 19. Test is run between conditions If 2 conditions are compared, for each gene 2 NB models (one for each condition) are made, and a test (Wald test) decides whether the difference is significant (red in plot). This means that we are going to perform 1000's of tests. If we set a cut-off on the p-value of 0,01 and we have performed 20000 tests (= genes), 1000 genes will appear significant by chance.
  • 20. Check the distribution of p-values An enrichment (smaller or Bigger) should be seen at low P-values. If the histogram of the p-values does not match a profile as shown below, the test is not reliable. Perhaps the NB fitting step did not succeed, or confounding variables are present. Other p-values should not show a trend.
  • 22. Improve test results A fraction is correctly identified as DE A fraction is false positive You set a cut-off of 0,05.
  • 23. Improve test results We can improve testing by 2 measures: avoid testing: apply a filtering before testing, an independent filtering. ● ● apply a multiple testing correction
  • 24. Independent filtering Some scientists just remove genes with mean counts in the samples <10. But there is a more formal method to remove genes, in order to reduce the testing. http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/ From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
  • 25. Independent filtering Left a scatter plot of mean counts versus transformed p-values. The red line depicts a cut-off of 0,1. Note that genes with lower counts do not reach the p-value threshold. Some of them are save to exclude from testing. http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/ From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
  • 26. Independent filtering If we filter out increasingly bigger portions of genes based on their mean counts, the number of significant genes increase.
  • 27. Independent filtering See later (slide 30) Choose the variable of interest. You can run it once on all to check the outcome. http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/ From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
  • 29. Multiple testing correction Automatically in results: Benjamini/Hochberg correction, to control false discovery rate (FDR). FDR is the fraction of false positives in the genes that are classified as DE. If we set a threshold α of 0,05, 20% of the DE genes will be false positives.
  • 30. Including different factors Always remember your experimental setup and the goal. Summarize the setup in the sample descriptions file. GDA (=G) GDA + vit C (=AG) Yeast (=WT) Yeast mutant (=UPC) Day 1 Day 2 Day 1 Day 2 Additional metadata (batch factor)
  • 31. Including different factors We provide a combination of factors (the model, GLM) which influence the counts. Every factor should match the column name in the sample descriptions The levels of the factors corresponding To the 'base' or 'no perturbation'. The fraction filtered out, determined by the independent filter tool. Adjusted p-value cut-off
  • 32. Including different factors The 'detect differential expression' tool gives you four results: the first is the report including graphs. Only lower than cut-off and with indep filtering. All genes, with indep filtering applied. Complete DESeq results, without indep filtering applied.
  • 33. Standard Error (SE) of LogFC Log2(FC) Standard Error (SE) of LogFC Including different factors Log2(FC)
  • 34. Including different factors Volcano plot: shows the DE genes with our given cut-off.
  • 35. Comparing different conditions GDA (=G) GDA + vit C (=AG) Yeast (=WT) Yeast mutant (=UPC) Day 1 Day 2 Day 1 Day 2 Which genes are DE between UPC and WT? Which genes are DE between G and AG? Which genes are DE in WT between G and AG?
  • 36. Comparing different conditions Adjust the sample descriptions file and the model: 1. Which genes are DE between UPC and WT? 2. Which genes are DE between G and AG? 3. Which genes are DE in WT between G and AG? 1. 2. 3. Remove these Remove these
  • 39. Reads
  • 40. Keywords Effective library size dispersion shrinking Significantly differentially expressed MA-plot Alpha cut-off Independent filtering FDR p-value Write in your own words what the terms mean
  • 41. Break