SlideShare a Scribd company logo
1 of 44
Download to read offline
This presentation is available under the Creative Commons
Attribution-ShareAlike 3.0 Unported License. Please refer to
http://www.bits.vib.be/ if you use this presentation or parts
hereof.
RNA-seq for DE analysis training
Detecting differentially
expressed genes
Joachim Jacob
22 and 24 April 2014
2 of 44
Bioinformatics analysis will take most of your time
Quality control (QC) of raw reads
Preprocessing: filtering of reads
and read parts, to help our goal
of differential detection.
QC of preprocessing Mapping to a reference genome
(alternative: to a transcriptome)
QC of the mapping
Count table extraction
QC of the count table
DE test
Biological insight
1
2
3
5
4
6
3 of 44
Goal: get me some DE genes!
Based on a raw count table, we want to detect
differentially expressed genes between conditions
of interest.
We will assign to each gene a p-value (0-1), which
shows us 'how surprised we should be' to see this
difference, when we assume there is no difference.
0 1
Very big chance there is a difference
p-value
Very small chance there is a real difference
4 of 44
Goal
Every single decision we have taken in
previous analysis steps was done to
improve this outcome of detecting DE
expressed genes.
5 of 44
Raw counts to DE genes
6 of 44
DE detection tools from count tables
http://wiki.bits.vib.be/index.php/RNAseq_toolbox#Detecting_differential_expression_by_count_analysis
7 of 44
Algorithms under active development
http://genomebiology.com/2010/11/10/r106
8 of 44
Intuition: how to detect DE?
gene_id CAF0006876
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
23171 22903 29227 24072 23151 26336 25252 24122
Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16
19527 26898 18880 24237 26640 22315 20952 25629
Variability X
Variability Y
Compare and conclude given a
mean (or 'base') level: similar
or not?
Condition A
Condition B
9 of 44
Intuition
gene_id CAF0006876
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
23171 22903 29227 24072 23151 26336 25252 24122
Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16
19527 26898 18880 24237 26640 22315 20952 25629
Condition A
Condition B
10 of 44
Intuition – model is fitted
gene_id CAF0006876
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
23171 22903 29227 24072 23151 26336 25252 24122
Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16
19527 26898 18880 24237 26640 22315 20952 25629
Condition A
Condition B
NB model is estimated
11 of 44
Intuition – difference is quantified
gene_id CAF0006876
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
23171 22903 29227 24072 23151 26336 25252 24122
Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16
19527 26898 18880 24237 26640 22315 20952 25629
Condition A
Condition B
NB model is estimated:
2 parameters: mean and
dispersion needed.
Difference is put into p-value
12 of 44
But counts are dependent on
The read counts of a gene between different
conditions, is dependent on (see first part):
1. Chance (NB model)
2. Expression level
3. Library size (number of reads in that library)
4. Length of transcript
5. GC content of the genes
13 of 44
Normalize for library size
Assumption: most genes are not DE between
samples. DESeq calculates for every sample the
'effective library size' by a scale factor.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
Rest of the
genes
Rest of the
genes
100%
100%
14 of 44
Normalize for library size
DESeq computes a scaling factor for a given sample by computing the median of the ratio, for each gene, of its
read count over its geometric mean across all samples. It then uses the assumption that most genes are not DE
and uses this median of ratios to obtain the scaling factor associated with this sample.
Original library size * scale factor = effective library size
DESeq will multiply original counts by the sample
scaling factor.
DESeq: This normalization method [14] is included in the DESeq Bioconductor package (ver-
sion 1.6.0) [14] and is based on the hypothesis that most genes are not DE. A DESeq scaling
factor for a given lane is computed as the median of the ratio, for each gene, of its read count
over its geometric mean across all lanes. The underlying idea is that non-DE genes should have
similar read counts across samples, leading to a ratio of 1. Assuming most genes are not DE, the
median of this ratio for the lane provides an estimate of the correction factor that should be
applied to all read counts of this lane to fulfill the hypothesis
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
15 of 44
Normalize for library size
Geom Mean
24595
…
…
...
…
…
1,1 1,2 0,9 1,0 1,2 0,8 0,85 1,0
0,9 1,0 1,2 0,8 0,85 1,0 1,1 1,2
1,1 1,2 0,9 1,0 1,2 0,8 0,85 1,0
1,0 1,2 0,8 0,85 1,0 1,2 1,1 0,8
1,1 1,0 1,2 0,8 0,85 1,0 0,85 1,0
0,8 0,85 1,01,1 1,2 0,9 1,0 1,2 1,2
… … … … … … … …
1,2 0,8 1,1 0,9 0,85 1,1 1,1 1,0
Divide each
count by
geomean
Take median
per column
16 of 44
Other normalisations
● EdgeR: TMM, trimmed mean of M-values
In the end: the algorithms conduct internally the
normalization, and just continue.
17 of 44
Dispersion estimation
● For every gene, an NB is fitted based on
the counts. The most important factor in
that model to be estimated is the
dispersion.
● DESeq2 applies three steps
● Estimates dispersion parameter for each gene
● Plots and fits a curve
● Adjusts the dispersion parameter towards the
curve ('shrinking')
18 of 44
Dispersion estimation
1. Black dots: estimated
from normalized data.
2. Red line: curve fitted
3. blue dots: final assigned
dispersion parameter for
that gene
Model is fit!
19 of 44
Test is run between conditions
If 2 conditions are
compared, for each
gene 2 NB models (one
for each condition) are
made, and a test (Wald
test) decides whether
the difference is
significant (red in plot).
Significant (p-value < 0,01)
Not significant
MA-plot: mean of counts
versus the log2 fold change
between 2 conditions.
20 of 44
Test is run between conditions
If 2 conditions are
compared, for each
gene 2 NB models (one
for each condition) are
made, and a test (Wald
test) decides whether
the difference is
significant (red in plot).
This means that we are
going to perform 1000's
of tests.
If we set a cut-off on the p-value
of 0,01 and we have performed
20000 tests (= genes), 200 genes that
do not differ will turn up
significant only by chance.
21 of 44
Check the distribution of p-values
An enrichment
(smaller or
Bigger) should
be seen at low
P-values.
Other p-values should not
show a trend.
The histogram of the p-values must
look like the one below. If not, the
test is not reliable. Perhaps the NB
fitting step did not succeed, or
confounding variables are present.
22 of 44
Confounded distribution of p-values
23 of 44
Improve test results
A fraction is
false positive
You set a cut-off of 0,05.
A fraction is
correctly identified
as DE
24 of 44
Improve test results
We can improve testing by 2 measures:
● avoid testing: apply a filtering before testing, an
independent filtering.
● apply a multiple testing correction
25 of 44
Avoid testing by independent filtering
Some scientists just
remove genes with
mean counts in the
samples <10. But there
is a more formal
method to remove
genes, in order to
reduce the testing.
http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/
From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
26 of 44
Avoid testing by independent filtering
Left: a scatter plot of
mean counts versus
transformed p-values.
The red line depicts a
cut-off of 0,1. Note that
genes with lower counts
do not reach the p-value
threshold. Some of them
are save to exclude from
testing.
27 of 44
Avoid testing by independent filtering
If we filter out increasingly bigger portions of genes based on their
mean counts, the number of significant genes increase.
28 of 44
Avoid testing by independent filtering
See later (slide 30)
Choose the variable of interest.
You can run it once on all to check the outcome.
http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/
From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
29 of 44
Avoid testing by independent filtering
30 of 44
Packages for independent filtering
HTSFilter is a package
especially developed for
independent filtering in a
non-arbitrary way.
In our Galaxy, during the
exercises, you will be
using another approach.
http://www.bioconductor.org/packages/release/bioc/html/HTSFilter.html
31 of 44
Multiple testing correction
Automatically performed and reported in results:
Benjamini/Hochberg correction, to control false
discovery rate (FDR).
FDR is the fraction of false positives in the genes that
are classified as DE.
If we set a threshold α of 0,05, 20% of the genes will
be false positives. If we apply FDR correction of 0.05,
5% of the genes in the final list will be false positives.
32 of 44
Including influencing factors
Through a generalized linear model (GLM), the
influencing factors are modeled to predict the counts.
The factors come from the sample descriptions file.
Yeast (=WT)
GDA (=G)
Yeast mutant
(=UPC)
GDA + vit C (=AG)
Additional metadata (batch
factor)
Day 1 Day 1Day 2 Day 2
33 of 44
DESeq2 to detect DE genes
We provide a combination of factors
(the model, GLM) which influence the
counts. Every factor should match the
column name in the sample
descriptions
The levels of the factors corresponding
To the 'base' or 'no perturbation'.
The fraction filtered out, determined
by the independent filter tool.
Adjusted p-value cut-off
34 of 44
The output of DESeq2
The 'detect differential
expression' tool gives you four
results: the first is the report
including graphs.
Only lower than
cut-off and with
indep filtering.
All genes, with indep
filtering applied.
Complete DESeq results,
without indep filtering
applied.
35 of 44
Effect of variance on DE detection
Log2(FC) Log2(FC)
StandardError(SE)ofLogFC
StandardError(SE)ofLogFC
All genes, with their logFCOnly the DE genes
36 of 44
Volcano plot is often asymmetric
Volcano plot: shows
the DE genes with
our given cut-off.
-0.3 0.3
-log10(pvalue)
log10(FC)
37 of 44
Comparing different conditions
Yeast (=WT)
GDA (=G)
Yeast mutant
(=UPC)
GDA + vit C (=AG)
Day 1 Day 1Day 2 Day 2
Which genes are DE between UPC and WT?
Which genes are DE between G and AG?
Which genes are DE in WT between G and AG?
38 of 44
Comparing different conditions
Adjust the sample descriptions file and the model:
Remove these
Remove these
1. Which genes are DE between UPC and WT?
2. Which genes are DE between G and AG?
3. Which genes are DE in WT between G and AG?
1. 2. 3.
39 of 44
Congratulations!
We have reached our goal!
40 of 44
Overview
http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html
41 of 44
Reads
42 of 44
Keywords
Effective library size
dispersion
shrinking
Significantly differentially expressed
MA-plot
Alpha cut-off
Independent filtering
FDR
p-value
Write in your own words what the terms mean
43 of 44
Exercises
● →
Detecting differential expression from a
count table
44 of 44
Break

More Related Content

What's hot

Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...Joachim Jacob
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 

What's hot (20)

Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
Part 6 of RNA-seq for DE analysis: Detecting biology from differential expres...
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Cn presentation
Cn presentationCn presentation
Cn presentation
 
presentation
presentationpresentation
presentation
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Rna seq
Rna seqRna seq
Rna seq
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Use of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay DesignUse of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay Design
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
State-of-the-Art Normalization of RT-qPCR Data
State-of-the-Art Normalization of RT-qPCR Data State-of-the-Art Normalization of RT-qPCR Data
State-of-the-Art Normalization of RT-qPCR Data
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 

Viewers also liked

Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachHong ChangBum
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachHong ChangBum
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vstQiang Kou
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 

Viewers also liked (10)

Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble Approach
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 

Similar to Part 5 of RNA-seq for DE analysis: Detecting differential expression

Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...DrAmitJoshi9
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821GenomeInABottle
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Thermo Fisher Scientific
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutationElsa von Licy
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
BMI214_Assignment2_S..
BMI214_Assignment2_S..BMI214_Assignment2_S..
BMI214_Assignment2_S..butest
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicClassification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicIJARIIE JOURNAL
 
Solving non linear programming minimization problem using genetic algorithm
Solving non linear programming minimization problem using genetic algorithmSolving non linear programming minimization problem using genetic algorithm
Solving non linear programming minimization problem using genetic algorithmLahiru Dilshan
 

Similar to Part 5 of RNA-seq for DE analysis: Detecting differential expression (20)

Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Design of experiments(
Design of experiments(Design of experiments(
Design of experiments(
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutation
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Validaternai
ValidaternaiValidaternai
Validaternai
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
report
reportreport
report
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
BMI214_Assignment2_S..
BMI214_Assignment2_S..BMI214_Assignment2_S..
BMI214_Assignment2_S..
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicClassification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
 
Solving non linear programming minimization problem using genetic algorithm
Solving non linear programming minimization problem using genetic algorithmSolving non linear programming minimization problem using genetic algorithm
Solving non linear programming minimization problem using genetic algorithm
 

More from Joachim Jacob

Korte handleiding van de Partago app
Korte handleiding van de Partago appKorte handleiding van de Partago app
Korte handleiding van de Partago appJoachim Jacob
 
Blaas nieuw leven in je PC met Linux
Blaas nieuw leven in je PC met LinuxBlaas nieuw leven in je PC met Linux
Blaas nieuw leven in je PC met LinuxJoachim Jacob
 
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Joachim Jacob
 
Part 6 of "Introduction to linux for bioinformatics": Productivity tips
Part 6 of "Introduction to linux for bioinformatics": Productivity tipsPart 6 of "Introduction to linux for bioinformatics": Productivity tips
Part 6 of "Introduction to linux for bioinformatics": Productivity tipsJoachim Jacob
 
Part 4 of 'Introduction to Linux for bioinformatics': Managing data
Part 4 of 'Introduction to Linux for bioinformatics': Managing data Part 4 of 'Introduction to Linux for bioinformatics': Managing data
Part 4 of 'Introduction to Linux for bioinformatics': Managing data Joachim Jacob
 
Part 2 of 'Introduction to Linux for bioinformatics': Installing software
Part 2 of 'Introduction to Linux for bioinformatics': Installing softwarePart 2 of 'Introduction to Linux for bioinformatics': Installing software
Part 2 of 'Introduction to Linux for bioinformatics': Installing softwareJoachim Jacob
 
Part 1 of 'Introduction to Linux for bioinformatics': Introduction
Part 1 of 'Introduction to Linux for bioinformatics': IntroductionPart 1 of 'Introduction to Linux for bioinformatics': Introduction
Part 1 of 'Introduction to Linux for bioinformatics': IntroductionJoachim Jacob
 

More from Joachim Jacob (8)

Korte handleiding van de Partago app
Korte handleiding van de Partago appKorte handleiding van de Partago app
Korte handleiding van de Partago app
 
Blaas nieuw leven in je PC met Linux
Blaas nieuw leven in je PC met LinuxBlaas nieuw leven in je PC met Linux
Blaas nieuw leven in je PC met Linux
 
The Galaxy toolshed
The Galaxy toolshedThe Galaxy toolshed
The Galaxy toolshed
 
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
 
Part 6 of "Introduction to linux for bioinformatics": Productivity tips
Part 6 of "Introduction to linux for bioinformatics": Productivity tipsPart 6 of "Introduction to linux for bioinformatics": Productivity tips
Part 6 of "Introduction to linux for bioinformatics": Productivity tips
 
Part 4 of 'Introduction to Linux for bioinformatics': Managing data
Part 4 of 'Introduction to Linux for bioinformatics': Managing data Part 4 of 'Introduction to Linux for bioinformatics': Managing data
Part 4 of 'Introduction to Linux for bioinformatics': Managing data
 
Part 2 of 'Introduction to Linux for bioinformatics': Installing software
Part 2 of 'Introduction to Linux for bioinformatics': Installing softwarePart 2 of 'Introduction to Linux for bioinformatics': Installing software
Part 2 of 'Introduction to Linux for bioinformatics': Installing software
 
Part 1 of 'Introduction to Linux for bioinformatics': Introduction
Part 1 of 'Introduction to Linux for bioinformatics': IntroductionPart 1 of 'Introduction to Linux for bioinformatics': Introduction
Part 1 of 'Introduction to Linux for bioinformatics': Introduction
 

Recently uploaded

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 

Part 5 of RNA-seq for DE analysis: Detecting differential expression

  • 1. This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof. RNA-seq for DE analysis training Detecting differentially expressed genes Joachim Jacob 22 and 24 April 2014
  • 2. 2 of 44 Bioinformatics analysis will take most of your time Quality control (QC) of raw reads Preprocessing: filtering of reads and read parts, to help our goal of differential detection. QC of preprocessing Mapping to a reference genome (alternative: to a transcriptome) QC of the mapping Count table extraction QC of the count table DE test Biological insight 1 2 3 5 4 6
  • 3. 3 of 44 Goal: get me some DE genes! Based on a raw count table, we want to detect differentially expressed genes between conditions of interest. We will assign to each gene a p-value (0-1), which shows us 'how surprised we should be' to see this difference, when we assume there is no difference. 0 1 Very big chance there is a difference p-value Very small chance there is a real difference
  • 4. 4 of 44 Goal Every single decision we have taken in previous analysis steps was done to improve this outcome of detecting DE expressed genes.
  • 5. 5 of 44 Raw counts to DE genes
  • 6. 6 of 44 DE detection tools from count tables http://wiki.bits.vib.be/index.php/RNAseq_toolbox#Detecting_differential_expression_by_count_analysis
  • 7. 7 of 44 Algorithms under active development http://genomebiology.com/2010/11/10/r106
  • 8. 8 of 44 Intuition: how to detect DE? gene_id CAF0006876 sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 23171 22903 29227 24072 23151 26336 25252 24122 Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 19527 26898 18880 24237 26640 22315 20952 25629 Variability X Variability Y Compare and conclude given a mean (or 'base') level: similar or not? Condition A Condition B
  • 9. 9 of 44 Intuition gene_id CAF0006876 sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 23171 22903 29227 24072 23151 26336 25252 24122 Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 19527 26898 18880 24237 26640 22315 20952 25629 Condition A Condition B
  • 10. 10 of 44 Intuition – model is fitted gene_id CAF0006876 sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 23171 22903 29227 24072 23151 26336 25252 24122 Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 19527 26898 18880 24237 26640 22315 20952 25629 Condition A Condition B NB model is estimated
  • 11. 11 of 44 Intuition – difference is quantified gene_id CAF0006876 sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 23171 22903 29227 24072 23151 26336 25252 24122 Sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 19527 26898 18880 24237 26640 22315 20952 25629 Condition A Condition B NB model is estimated: 2 parameters: mean and dispersion needed. Difference is put into p-value
  • 12. 12 of 44 But counts are dependent on The read counts of a gene between different conditions, is dependent on (see first part): 1. Chance (NB model) 2. Expression level 3. Library size (number of reads in that library) 4. Length of transcript 5. GC content of the genes
  • 13. 13 of 44 Normalize for library size Assumption: most genes are not DE between samples. DESeq calculates for every sample the 'effective library size' by a scale factor. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/ Rest of the genes Rest of the genes 100% 100%
  • 14. 14 of 44 Normalize for library size DESeq computes a scaling factor for a given sample by computing the median of the ratio, for each gene, of its read count over its geometric mean across all samples. It then uses the assumption that most genes are not DE and uses this median of ratios to obtain the scaling factor associated with this sample. Original library size * scale factor = effective library size DESeq will multiply original counts by the sample scaling factor. DESeq: This normalization method [14] is included in the DESeq Bioconductor package (ver- sion 1.6.0) [14] and is based on the hypothesis that most genes are not DE. A DESeq scaling factor for a given lane is computed as the median of the ratio, for each gene, of its read count over its geometric mean across all lanes. The underlying idea is that non-DE genes should have similar read counts across samples, leading to a ratio of 1. Assuming most genes are not DE, the median of this ratio for the lane provides an estimate of the correction factor that should be applied to all read counts of this lane to fulfill the hypothesis http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426807/
  • 15. 15 of 44 Normalize for library size Geom Mean 24595 … … ... … … 1,1 1,2 0,9 1,0 1,2 0,8 0,85 1,0 0,9 1,0 1,2 0,8 0,85 1,0 1,1 1,2 1,1 1,2 0,9 1,0 1,2 0,8 0,85 1,0 1,0 1,2 0,8 0,85 1,0 1,2 1,1 0,8 1,1 1,0 1,2 0,8 0,85 1,0 0,85 1,0 0,8 0,85 1,01,1 1,2 0,9 1,0 1,2 1,2 … … … … … … … … 1,2 0,8 1,1 0,9 0,85 1,1 1,1 1,0 Divide each count by geomean Take median per column
  • 16. 16 of 44 Other normalisations ● EdgeR: TMM, trimmed mean of M-values In the end: the algorithms conduct internally the normalization, and just continue.
  • 17. 17 of 44 Dispersion estimation ● For every gene, an NB is fitted based on the counts. The most important factor in that model to be estimated is the dispersion. ● DESeq2 applies three steps ● Estimates dispersion parameter for each gene ● Plots and fits a curve ● Adjusts the dispersion parameter towards the curve ('shrinking')
  • 18. 18 of 44 Dispersion estimation 1. Black dots: estimated from normalized data. 2. Red line: curve fitted 3. blue dots: final assigned dispersion parameter for that gene Model is fit!
  • 19. 19 of 44 Test is run between conditions If 2 conditions are compared, for each gene 2 NB models (one for each condition) are made, and a test (Wald test) decides whether the difference is significant (red in plot). Significant (p-value < 0,01) Not significant MA-plot: mean of counts versus the log2 fold change between 2 conditions.
  • 20. 20 of 44 Test is run between conditions If 2 conditions are compared, for each gene 2 NB models (one for each condition) are made, and a test (Wald test) decides whether the difference is significant (red in plot). This means that we are going to perform 1000's of tests. If we set a cut-off on the p-value of 0,01 and we have performed 20000 tests (= genes), 200 genes that do not differ will turn up significant only by chance.
  • 21. 21 of 44 Check the distribution of p-values An enrichment (smaller or Bigger) should be seen at low P-values. Other p-values should not show a trend. The histogram of the p-values must look like the one below. If not, the test is not reliable. Perhaps the NB fitting step did not succeed, or confounding variables are present.
  • 22. 22 of 44 Confounded distribution of p-values
  • 23. 23 of 44 Improve test results A fraction is false positive You set a cut-off of 0,05. A fraction is correctly identified as DE
  • 24. 24 of 44 Improve test results We can improve testing by 2 measures: ● avoid testing: apply a filtering before testing, an independent filtering. ● apply a multiple testing correction
  • 25. 25 of 44 Avoid testing by independent filtering Some scientists just remove genes with mean counts in the samples <10. But there is a more formal method to remove genes, in order to reduce the testing. http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/ From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
  • 26. 26 of 44 Avoid testing by independent filtering Left: a scatter plot of mean counts versus transformed p-values. The red line depicts a cut-off of 0,1. Note that genes with lower counts do not reach the p-value threshold. Some of them are save to exclude from testing.
  • 27. 27 of 44 Avoid testing by independent filtering If we filter out increasingly bigger portions of genes based on their mean counts, the number of significant genes increase.
  • 28. 28 of 44 Avoid testing by independent filtering See later (slide 30) Choose the variable of interest. You can run it once on all to check the outcome. http://www.bioconductor.org/help/course-materials/2012/Bressanone2012/ From this collection, read 2012-07-04-Huber-Multiple-testing-independent-filtering.pdf
  • 29. 29 of 44 Avoid testing by independent filtering
  • 30. 30 of 44 Packages for independent filtering HTSFilter is a package especially developed for independent filtering in a non-arbitrary way. In our Galaxy, during the exercises, you will be using another approach. http://www.bioconductor.org/packages/release/bioc/html/HTSFilter.html
  • 31. 31 of 44 Multiple testing correction Automatically performed and reported in results: Benjamini/Hochberg correction, to control false discovery rate (FDR). FDR is the fraction of false positives in the genes that are classified as DE. If we set a threshold α of 0,05, 20% of the genes will be false positives. If we apply FDR correction of 0.05, 5% of the genes in the final list will be false positives.
  • 32. 32 of 44 Including influencing factors Through a generalized linear model (GLM), the influencing factors are modeled to predict the counts. The factors come from the sample descriptions file. Yeast (=WT) GDA (=G) Yeast mutant (=UPC) GDA + vit C (=AG) Additional metadata (batch factor) Day 1 Day 1Day 2 Day 2
  • 33. 33 of 44 DESeq2 to detect DE genes We provide a combination of factors (the model, GLM) which influence the counts. Every factor should match the column name in the sample descriptions The levels of the factors corresponding To the 'base' or 'no perturbation'. The fraction filtered out, determined by the independent filter tool. Adjusted p-value cut-off
  • 34. 34 of 44 The output of DESeq2 The 'detect differential expression' tool gives you four results: the first is the report including graphs. Only lower than cut-off and with indep filtering. All genes, with indep filtering applied. Complete DESeq results, without indep filtering applied.
  • 35. 35 of 44 Effect of variance on DE detection Log2(FC) Log2(FC) StandardError(SE)ofLogFC StandardError(SE)ofLogFC All genes, with their logFCOnly the DE genes
  • 36. 36 of 44 Volcano plot is often asymmetric Volcano plot: shows the DE genes with our given cut-off. -0.3 0.3 -log10(pvalue) log10(FC)
  • 37. 37 of 44 Comparing different conditions Yeast (=WT) GDA (=G) Yeast mutant (=UPC) GDA + vit C (=AG) Day 1 Day 1Day 2 Day 2 Which genes are DE between UPC and WT? Which genes are DE between G and AG? Which genes are DE in WT between G and AG?
  • 38. 38 of 44 Comparing different conditions Adjust the sample descriptions file and the model: Remove these Remove these 1. Which genes are DE between UPC and WT? 2. Which genes are DE between G and AG? 3. Which genes are DE in WT between G and AG? 1. 2. 3.
  • 39. 39 of 44 Congratulations! We have reached our goal!
  • 42. 42 of 44 Keywords Effective library size dispersion shrinking Significantly differentially expressed MA-plot Alpha cut-off Independent filtering FDR p-value Write in your own words what the terms mean
  • 43. 43 of 44 Exercises ● → Detecting differential expression from a count table