The biology behind
expression differences
RNA-seq for DE analysis training
Joachim Jacob
20 and 27 January 2014

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to
http://www.bits.vib.be/ if you use this presentation or parts hereof.
Overview

http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html
Analyzing the DE analysis results
The 'detect differential
expression' tool gives you four
results: the first is the report
including graphs.

Only lower than
cut-off and with
indep filtering.

All genes, with indep
filtering applied.

Complete DESeq results,
without indep filtering
applied.
Analyzing the DE analysis results

Only lower than
cut-off and with
indep filtering.

All genes, with indep
filtering applied.

Complete DESeq results,
without indep filtering
applied.
Setting a cut-off
You choose a cut-off!
You can go over the
genes one by one, and
look for 'interesting'
genes, and try to link it
to the experimental
conditions.

Alternative: we can
take all genes, ranked
by their p-value (which
stands a 'level of
surprise'). Pro: we
don't need our
arbitrary cut-off.
Analysis of the list of DE genes

All genes (6666 yeast genes)
Genes sensible to test (filtered
out 10% of the lowest genes)
(5830 yeast genes)
DE genes with p-value
cut-off of 0,01 (637
genes)
Gene set enrichment
●

We use the knowledge already available
on biology. We construct list of genes for:
●

Pathways

●

Biological processes

●

Cellular components

●

Molecular functions

●

Transcription binding sites

●

...
http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis
Getting lists of genes
●

Gene Ontology consortium

●

Reactome:
A many-to-many relation
Linking gene IDs to molecular function.
… to binding partners
... to transcription factor
binding sites.
Biomart can help you fetch sets
Biomart can help you
Contingency approach
DE results

Gene set 1

637/5830

15/56

Equal?
(hypergeometric test)
Contingency approach
DE results

Gene set 2

637/5830

5/30
Contingency approach
DE results

Gene set 3

637/5830

34/78

! Gene set
enriched
Artificial?
DE results

But cut-off remains artificial,
arbitrarily chosen. Rerun with
different cut-off: you will detect
other significant sets!
The background needs to be
carefully chosen.
This approach favors gene sets
with genes whose expression
differs a lot ('high level of
surprise', p-value).
Contingency table approach tools

http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis
Cut-off free approach
No cut-off needs to be chosen
using GSEA and derived
methods!
We take into account all genes
for which we get a reliable
p-value. (see the p-value
histogram chart).
The genes are sorted/ranked
according to 'level of surprise',
i.e. by their p-value. (other
options are test-statistics (T,...))
Intuition of GSEA
Gene set 1

Running sum:
Every occurrence
increases the sum,
every absence
decreases the sum.
The maximum is
the MES, the
final score

0

p-value

1

Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
Intuition of GSEA
Gene set 2

Higher running sum MES

Gene set 3

Median running sum MES
Low running sum MES

Gene set 4

The scores are compared to permutated/shuffled gene
set (sample label versus gene label permutation).

0

p-value

1
Cut-off free approach
The advantages:
● Robustness about mapping
errors influencing counts
● The set can be detected even
if some genes are not present.
● Tolerance if gene set contains
incorrect genes.
● Strong signal if all genes are
only seemingly lightly
overexpressed.
With cut-off applied
Genes involved in
oxidative phosphorylation

Significant DE genes
(p-value <0,05)

Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
Cut-off free approach

Genes involved in oxidative
phosphorylation are nearly
all slightly overexpressed.
This can be detected by
gene set analysis.

Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
GSEA has inspired others.
Different methods exist to rank the genes, to
calculate the running sum, and to check
significance of the running sum. In addition,
directionality of the changes can be incorporated.

Varemo et al. http://nar.oxfordjournals.org/content/early/2013/02/26/nar.gkt111
GSEA has inspired many
Piano

SPIA
Piano provides a consensus output
Piano has combined
different methods and
calculates a consensus
score. It does this for 5
different types of
'directionality classes'.
The main output is a
heatmap with gene set
significantly enriched,
depleted or just changed.

The sets

Ranks! Lower is
'more important'
Piano provides a consensus output

1) distinct-directional down: gene set as a whole is downregulated.
2) mixed-directional down: A subset of the set is significantly downregulated
3) non-directional: the set is enriched in significant DE genes without taking
into account directionality.
4) mixed-directional up: A subset of the set is significantly upregulated
5) distinct-directional up: gene set as a whole is upregulated.
Keywords
Gene set
Contingency approach
T-statistic
P-value histogram
GSEA
heatmap
Directionality of expression changes

Write in your own words what the terms mean
Break

RNA-seq for DE analysis: the biology behind observed changes - part 6

  • 1.
    The biology behind expressiondifferences RNA-seq for DE analysis training Joachim Jacob 20 and 27 January 2014 This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.
  • 2.
  • 3.
    Analyzing the DEanalysis results The 'detect differential expression' tool gives you four results: the first is the report including graphs. Only lower than cut-off and with indep filtering. All genes, with indep filtering applied. Complete DESeq results, without indep filtering applied.
  • 4.
    Analyzing the DEanalysis results Only lower than cut-off and with indep filtering. All genes, with indep filtering applied. Complete DESeq results, without indep filtering applied.
  • 5.
    Setting a cut-off Youchoose a cut-off! You can go over the genes one by one, and look for 'interesting' genes, and try to link it to the experimental conditions. Alternative: we can take all genes, ranked by their p-value (which stands a 'level of surprise'). Pro: we don't need our arbitrary cut-off.
  • 6.
    Analysis of thelist of DE genes All genes (6666 yeast genes) Genes sensible to test (filtered out 10% of the lowest genes) (5830 yeast genes) DE genes with p-value cut-off of 0,01 (637 genes)
  • 7.
    Gene set enrichment ● Weuse the knowledge already available on biology. We construct list of genes for: ● Pathways ● Biological processes ● Cellular components ● Molecular functions ● Transcription binding sites ● ... http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis
  • 8.
    Getting lists ofgenes ● Gene Ontology consortium ● Reactome:
  • 9.
    A many-to-many relation Linkinggene IDs to molecular function. … to binding partners ... to transcription factor binding sites.
  • 10.
    Biomart can helpyou fetch sets
  • 11.
  • 12.
    Contingency approach DE results Geneset 1 637/5830 15/56 Equal? (hypergeometric test)
  • 13.
  • 14.
    Contingency approach DE results Geneset 3 637/5830 34/78 ! Gene set enriched
  • 15.
    Artificial? DE results But cut-offremains artificial, arbitrarily chosen. Rerun with different cut-off: you will detect other significant sets! The background needs to be carefully chosen. This approach favors gene sets with genes whose expression differs a lot ('high level of surprise', p-value).
  • 16.
    Contingency table approachtools http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis
  • 17.
    Cut-off free approach Nocut-off needs to be chosen using GSEA and derived methods! We take into account all genes for which we get a reliable p-value. (see the p-value histogram chart). The genes are sorted/ranked according to 'level of surprise', i.e. by their p-value. (other options are test-statistics (T,...))
  • 18.
    Intuition of GSEA Geneset 1 Running sum: Every occurrence increases the sum, every absence decreases the sum. The maximum is the MES, the final score 0 p-value 1 Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
  • 19.
    Intuition of GSEA Geneset 2 Higher running sum MES Gene set 3 Median running sum MES Low running sum MES Gene set 4 The scores are compared to permutated/shuffled gene set (sample label versus gene label permutation). 0 p-value 1
  • 20.
    Cut-off free approach Theadvantages: ● Robustness about mapping errors influencing counts ● The set can be detected even if some genes are not present. ● Tolerance if gene set contains incorrect genes. ● Strong signal if all genes are only seemingly lightly overexpressed.
  • 21.
    With cut-off applied Genesinvolved in oxidative phosphorylation Significant DE genes (p-value <0,05) Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
  • 22.
    Cut-off free approach Genesinvolved in oxidative phosphorylation are nearly all slightly overexpressed. This can be detected by gene set analysis. Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
  • 23.
    GSEA has inspiredothers. Different methods exist to rank the genes, to calculate the running sum, and to check significance of the running sum. In addition, directionality of the changes can be incorporated. Varemo et al. http://nar.oxfordjournals.org/content/early/2013/02/26/nar.gkt111
  • 24.
    GSEA has inspiredmany Piano SPIA
  • 25.
    Piano provides aconsensus output Piano has combined different methods and calculates a consensus score. It does this for 5 different types of 'directionality classes'. The main output is a heatmap with gene set significantly enriched, depleted or just changed. The sets Ranks! Lower is 'more important'
  • 26.
    Piano provides aconsensus output 1) distinct-directional down: gene set as a whole is downregulated. 2) mixed-directional down: A subset of the set is significantly downregulated 3) non-directional: the set is enriched in significant DE genes without taking into account directionality. 4) mixed-directional up: A subset of the set is significantly upregulated 5) distinct-directional up: gene set as a whole is upregulated.
  • 27.
    Keywords Gene set Contingency approach T-statistic P-valuehistogram GSEA heatmap Directionality of expression changes Write in your own words what the terms mean
  • 28.