Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
2. Download all material for the tutorial
https://sourceforge.net/projects/teachingdemos/files/2014%2
https://sourceforge.net/projects/teachingdemos/files/
Choose 2014 UC Davis Proteomics Workshop or use the
full URL below
3. • decrease
• increase
Use functional analysis to identify if the changes in variables
are enriched (increased compared to random chance) for
some biological pathway, domain or ontological category.
5. Major Tasks
Using the proteins listed in the excel workbook: ‘proteomic data for
analysis.xlsx’ and worksheet: ‘protein IDs’
1. Conduct Gene Ontology (GO) Enrichment Analysis using
DAVID Bioinformatics Resources
http://david.abcc.ncifcrf.gov/home.jsp
2. Investigate enriched terms using
Quick GO http://www.ebi.ac.uk/QuickGO/
3. Summaries and visualize the results using
REVIGO http://revigo.irb.hr/
4. Create and modify GO network using
Cytoscape http://www.cytoscape.org/
6. Protein IDs
Common protein identifier
UniProt/SwissProt Accession
(default in scaffold)
http://www.uniprot.org/
Use Biomart to translate to other
database IDS
http://www.biomart.org/
e.g. gene symbols
18. Overview Results
Modified Fisher’s Exact Test p-value
optionally: Check in R
x<-data.frame(user=c(1,47),genome=c(690,13528))
fisher.test(x) # p-value = 5.41e-06
(13/47) / (690/13528)
19. Alternative to Fisher Exact Test:
Hypergeometric Test
How to calculate statistics to determine enrichment?
hit.num = 51 # number of significantly changed pathway variables
set.num = 1455 # number of variables in pathway
full = 3358 # all possible variables in organism
q.size = 72 # number of significantly changed variables
phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)
enrichment p-value = 1.717553e-06
21. Use REVIGO to filter redundant terms
http://revigo.irb.hr/
prepare input (term, p-value)
1. Upload to
REVIGO
Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800
2. Run
24. REVIGO: network
• Edges: 3% of the
strongest GO term
pairwise similarities
• Node size: generality
of term
(small = specific)
• Node color: p-value
Download network
27. Cytoscape: map data to network properties
1. Set Edge width and color 2. Set Node labels, size and color
28. Cytoscape: overview network components
Download edge information
1
2
3. View in excel
Download node information
1
2
3. View in excel
29. Bonus: Modify Edge and Node Attributes to show
term to protein connections
See file ‘test edge.xlsx’ and ‘test node.xslx, for examples of upload
formats
See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping
30. See more Statistical and Multivariate Analysis Examples at
http://imdevsoftware.wordpress.com/tutorials/
Questions?
dgrapov@ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154