The Functional and Pathway Analysis talk given in March 2010 at the CRUK CRI. Cambridge UK.
It was designed to introduce wet-lab researchers to using web-based tools for doing functional analysis of gene lists, such as from microarray experiments.
1. Functional and Pathway
Analysis
Stewart MacArthur
Bioinformatics Core
March 18th, 2010
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 1 / 19
2. Introduction The Problem
The Problem
• High-throughput genomics methods:
• microarrays
• next generation sequencing
• Generate large lists of “interesting” genes
• How to we summarize?
• What are the themes of the lists?
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 2 / 19
3. Introduction The Solution
The Solution
• Functional Analysis
• Determine common
functions
• Find groups of functionally
related genes
• Pathways Analysis
• Determine common
pathways
• Determine potential
up/down stream regulators
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 3 / 19
4. Enrichment Analysis Methods
The Methods
Enrichment Analysis
Are there more of the genes in my list in functional category X than we could
expect by chance?
• SEA - Singular Enrichment Analysis
• MEA - Modular Enrichment Analysis
• GSEA - Gene Set Enrichment Analysis
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 4 / 19
5. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
6. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
7. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
8. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
9. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
10. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
11. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
12. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
13. Enrichment Analysis Methods Hypergeometric
Brief Aside: Hypergeometric Test
The hypergeometric test calculates the probability that the number of
genes in our gene list that are in functional category/pathway X
occured by chance
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 5 / 19
14. Enrichment Analysis Methods SEA - Singular Enrichment Analysis
SEA - Singular Enrichment Analysis
Inputs:
• List of “interesting” genes, e.g. DE genes
• List of functional annotations e.g. GO annotations
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 6 / 19
15. Enrichment Analysis Methods SEA - Singular Enrichment Analysis
SEA - Singular Enrichment Analysis
Inputs:
• List of “interesting” genes, e.g. DE genes
• List of functional annotations e.g. GO annotations
Method:
For each annotation
• Are more of the genes in our list present than would be expected by
chance
• Calculate p-value
Next annotation
• Correction for multiple testing
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 6 / 19
16. Enrichment Analysis Methods SEA - Singular Enrichment Analysis
SEA - Singular Enrichment Analysis
Inputs:
• List of “interesting” genes, e.g. DE genes
• List of functional annotations e.g. GO annotations
Method:
For each annotation
• Are more of the genes in our list present than would be expected by
chance
• Calculate p-value
Next annotation
• Correction for multiple testing
Output:
• Ranked list of annotations
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 6 / 19
17. Enrichment Analysis Methods MEA - Modular Enrichment Analysis
MEA - Modular Enrichment Analysis
• Extension of SEA
• Incorporates network discovery algorithms
• Considers term-to-term relationships
• Terms not treated as separate tests
• Uses co-occurrences of terms
• More closely related to biology
• Based on assumption that related functional groups have similar
member genes
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 7 / 19
18. Enrichment Analysis Methods GSEA -Gene Set Enrichment Analysis
GSEA
No cutoff, uses all genes ranked
e.g. microarray experiment ranked by fold change or differential expression
For each functional annotation
• Are genes randomly distributed in ranked list?
or
• Are genes distributed towards the top/bottom of the list?
• Calculate enrichment score (ES)
• Calculate significance of ES
Next annotation
• Correct for multiple testing
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 8 / 19
19. Enrichment Analysis Methods GSEA -Gene Set Enrichment Analysis
GSEA
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 9 / 19
20. Enrichment Analysis Methods GSEA -Gene Set Enrichment Analysis
GSEA
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 9 / 19
21. Enrichment Analysis Methods GSEA -Gene Set Enrichment Analysis
GSEA
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 9 / 19
22. Enrichment Analysis Methods GSEA -Gene Set Enrichment Analysis
GSEA
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 9 / 19
23. Annotation Resources
Annotation Resources
Where do the gene sets come from?
• GO - Gene Ontology
• KEGG - Kyoto Encyclopedia of Genes and Genomes
• MSigDB - Molecular Signatures Database
• Pathway Commons
• ...
• ...
Choice of annotation often dictated by choice of tool
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 10 / 19
24. Web based tools
Tools
• Approximately 68 enrichment tools
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 11 / 19
25. Web based tools
Tools
• Here they are:
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 11 / 19
26. Web based tools
Tools
• Mainly Web based
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 11 / 19
27. Web based tools
Tools
• Mainly Hypergeometric based
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 11 / 19
28. Web based tools
Recommended Tools
• SEA - ClueGO, GOStat,
• MEA - DAVID, GOToolBox
• GSEA - GeneTrail, FatiScan (Babelomics)
See Bioinformatics Core Wiki Page for more tools
http://criwiki.cancerresearchuk.org/
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 12 / 19
29. Web based tools David
DAVID http://david.abcc.ncifcrf.gov
The Database for Annotation, Visualization and Integrated Discovery
• Over 1,600 DAVID citations
• 37 nature-branded citations to
date
• Daily Usage: 1200 gene
lists/sublists
• Daily Usage: 400 unique
researchers.
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 13 / 19
30. Web based tools David
DAVID http://david.abcc.ncifcrf.gov
The Database for Annotation, Visualization and Integrated Discovery
• Identify enriched biological themes
• Discover enriched functional-related gene groups
• Cluster redundant annotation terms
• Visualize genes on BioCarta & KEGG pathway maps
• Search for other functionally related genes not in the list
• Convert gene identifiers from one type to another.
• And more
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 14 / 19
31. Web based tools GeneTrail
GeneTrail
Annotations include
• KEGG
• TRANSPATH
• TRANSFAC
• GO
Methods:
• Over-Representation Analysis (ORA)
• Gene Set Enrichment Analysis (GSEA)
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 15 / 19
32. Commercial Tools
Ingenuity Pathways Analysis (IPA)
Stewart MacArthur (Bioinformatics Core) Functional and Pathway Analysis March 18th, 2010 16 / 19