Speaker: Enrico Glaab, Luxembourg Centre for Systems BiomedicineEnrichNet: network-based gene set enrichment analysisAutho...
1MotivationHow to identify and score functional associations between a gene/protein set ofinterest (target set) and a coll...
2Previous approachesPrevious gene/protein set enrichment analyses techniques:Three types of enrichment analysis approaches...
3EnrichNet: Design principles (1)Network association measure for mapped datasets:account for distances in a molecular netw...
4EnrichNet: Design principles (2)Handling of overlapping nodes and long distance outliers:overlapping nodes and node pairs...
5EnrichNet: ProcedureInput:• 10 or more human gene or protein identifiers of interest (= target set)• Selection of a refer...
6EnrichNet: Random walk with restart (RWR)RWR relevance scoring (Tong et al., 2006):Simulate random walks via iterative ma...
7EnrichNet: Background modelPathway-based background model:• Gene/protein sets for background model should have similar co...
8EnrichNet: Comparative analysisComparative analysis on benchmark microarray data:• compare EnrichNet against classical ov...
9EnrichNet: ResultsBiological application on disease-related gene setsEnrichNet is suited in particular for the following ...
10DEGs for Parkinson‘s disease (PD) vs. KEGG PD pathway• DEGs in PD vs.control samples• KEGG Parkinson‘sdisease pathway• O...
11DEGs for PD vs. exocytosis regulation pathway• DEGs in PD vs.control samples• Regulation of exocytosisprocess (Gene Onto...
12Summary• EnrichNet provides a new means to score and interpret gene/protein setassociations by exploiting functional inf...
13AvailabilitySoftware, tutorials and examples freely available at:www.enrichnet.orgWe acknowledge support by:
14ReferencesReferences1. E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, A. Valencia. EnrichNet: network-based gene set e...
Upcoming SlideShare
Loading in …5
×

EnrichNet: Graph-based statistic and web-application for gene/protein set enrichment analysis

887 views

Published on

EnrichNet is a web-application and web-service to identify and visualize functional associations between a user-defined list of genes/proteins and known cellular pathways. As a complement to classical overlap-based enrichment analysis methods, the EnrichNet approach integrates a novel graph-based statistic with a new interactive visualization of network sub-structures to enable a direct molecular interpretation of how a set of genes or proteins is related to a specific cellular pathway. Available at: http://www.enrichnet.org

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
887
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

EnrichNet: Graph-based statistic and web-application for gene/protein set enrichment analysis

  1. 1. Speaker: Enrico Glaab, Luxembourg Centre for Systems BiomedicineEnrichNet: network-based gene set enrichment analysisAuthors: Enrico Glaab, Anaïs Baudot, Natalio Krasnogor, Reinhard Schneider, Alfonso Valencia
  2. 2. 1MotivationHow to identify and score functional associations between a gene/protein set ofinterest (target set) and a collection of known, annotated gene/protein sets(reference sets), representing cellular pathways, processes or complexes?Problem:Functional annotation/pathwaydatabases (reference sets)Experimentally-derivedgene/protein set (target set)
  3. 3. 2Previous approachesPrevious gene/protein set enrichment analyses techniques:Three types of enrichment analysis approaches (see Huang et al., Nucleic Acid Res, 2009):• Over-representation analysis (ORA)• Gene Set Enrichment Analysis (GSEA)• Integrative and modular enrichment analysis (MEA)generally applicable, but scores often not discriminative, rankings difficult to interpret biologicallyquantitative measurements required, molecular network neighbourhood not taken into accountmostly use clustering of annotations or data from ontology graphs rather than molecular networksGOAL: Maximally exploit functional information from a molecular interactionnetwork for association scoring and visualization
  4. 4. 3EnrichNet: Design principles (1)Network association measure for mapped datasets:account for distances in a molecular network and multiplicity and density of interactions betweenthe datasets of interest (use random walk distances instead of shortest paths distances)Example sub-networks:reference nodetarget set nodeother nodesCase 1:dense inter-connectionsCase 2:sparse inter-connections
  5. 5. 4EnrichNet: Design principles (2)Handling of overlapping nodes and long distance outliers:overlapping nodes and node pairs with small distances expected to be over-represented infunctionally associated datasets: assign heigher weight to short distance node pairsaccount for outlier nodes: assign lower weight to long distance node pairsExample sub-network:outlier(low weight)outlier(low weight)pathway nodetarget set nodeother nodesoverlap (high weight)
  6. 6. 5EnrichNet: ProcedureInput:• 10 or more human gene or protein identifiers of interest (= target set)• Selection of a reference database (gene sets from GO, KEGG, BioCarta, Reactome,WikiPathways, PID, etc.)Processing (details on next slides):• Target and reference datasets are mapped onto a human genome-scale molecular network(default: STRING confidence-weighted PPI network, optional: user-defined network)• Random walk with restart (RWR) algorithm applied to compute node-specific association scoresbetween mapped target set and reference sets• Integration of scores for each reference set and comparison against background modelOutput:• Ranking table of reference pathways with association scores (optional: 60 tissue-specific scores)• For each reference dataset: Interactive sub-network visualization of the association with target set
  7. 7. 6EnrichNet: Random walk with restart (RWR)RWR relevance scoring (Tong et al., 2006):Simulate random walks via iterative matrixmultiplications:pt+1 = (1-r) A pt + p0• A:= network adjacency matrix• r:= restart probability (here: r = 0.9)• pit:= probability walker is at node i at time tResult: a vector of node relevance scores foreach reference pathway (converted todistance scores and compared against abackground model, see next slide)Example network:target set target/pathway overlappathway 1 pathway 2
  8. 8. 7EnrichNet: Background modelPathway-based background model:• Gene/protein sets for background model should have similar connectivity properties aspathway-representing reference nodes (not the case for random matched-size node sets)use score distribution across the entire reference database as background(n = number of equally spaced distance bins, default: n = 10;Tissue-specific scores: pre-filter nodes by tissue-label)• Apply Xd-distance (Olmea et al., 1999) to compare foreground against background distancesdistance-dependent weighting (account for long-distance and high degree outliers)
  9. 9. 8EnrichNet: Comparative analysisComparative analysis on benchmark microarray data:• compare EnrichNet against classical over-representation analysis using benchmark datasets fromthe Broad Institute of MIT and Harvard (5 gene expression datasets and 2 reference databases)EnrichNet provides a consistently higher agreement with benchmark gene set rankings
  10. 10. 9EnrichNet: ResultsBiological application on disease-related gene setsEnrichNet is suited in particular for the following settings:1) Target gene/protein set of interest has no associated high-throughput experimental data:Examples: Mutated genes in genetic diseases (OMIM, COSMIC, CGC)Gene sets obtained from the literature (risk factors, animal model genes)2) Target and reference set share few members but are densely connected in the network:Examples: Occurs often for differentially expressed genes (DEGs) in complexphenotypes (examples for Parkinson‘s disease on next slides)Occurs often when integrating results from different studies or omics(e.g. comparing transcriptomics and proteomics data)
  11. 11. 10DEGs for Parkinson‘s disease (PD) vs. KEGG PD pathway• DEGs in PD vs.control samples• KEGG Parkinson‘sdisease pathway• OverlapOPA1 mediatesmitochondrial fusionNR4A2 mutations have beenassociated with familial PD
  12. 12. 11DEGs for PD vs. exocytosis regulation pathway• DEGs in PD vs.control samples• Regulation of exocytosisprocess (Gene Ontology)• Overlap
  13. 13. 12Summary• EnrichNet provides a new means to score and interpret gene/protein setassociations by exploiting functional information captured in the graph structureof molecular networks• New functional associations are identified and sub-network visualizationsenable a biological interpretation on the level of single molecular interactions
  14. 14. 13AvailabilitySoftware, tutorials and examples freely available at:www.enrichnet.orgWe acknowledge support by:
  15. 15. 14ReferencesReferences1. E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, A. Valencia. EnrichNet: network-based gene set enrichment analysis,Bioinformatics, 28(18):i451-i457, 20122. E. Glaab, R. Schneider, PathVar: analysis of gene and protein expression variance in cellular pathways using microarraydata, Bioinformatics, 28(3):446-447, 20123. E. Glaab, J. Bacardit, J. M. Garibaldi, N. Krasnogor, Using rule-based machine learning for candidate disease geneprioritization and sample classification of cancer gene expression data, PLoS ONE, 7(7):e39932, 20124. E. Glaab, A. Baudot, N. Krasnogor, A. Valencia. TopoGSA: network topological gene set analysis,Bioinformatics, 26(9):1271-1272, 20105. E. Glaab, A. Baudot, N. Krasnogor, A. Valencia. Extending pathways and processes using molecular interaction networksto analyse cancer genome data, BMC Bioinformatics, 11(1):597, 20106. H. O. Habashy, D. G. Powe, E. Glaab, N. Krasnogor, J. M. Garibaldi, E. A. Rakha, G. Ball, A. R Green, C. Caldas, I. O.Ellis, RERG (Ras-related and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of ER-positiveluminal-like subtype, Breast Cancer Research and Treatment, 128(2):315-326, 20117. E. Glaab, J. M. Garibaldi and N. Krasnogor. ArrayMining: a modular web-application for microarray analysis combiningensemble and consensus methods with cross-study normalization, BMC Bioinformatics,10:358, 20098. E. Glaab, J. M. Garibaldi, N. Krasnogor. Learning pathway-based decision rules to classify microarray cancer samples,German Conference on Bioinformatics 2010, Lecture Notes in Informatics (LNI), 173, 123-1349. E. Glaab, J. M. Garibaldi and N. Krasnogor. VRMLGen: An R-package for 3D Data Visualization on the Web, Journal ofStatistical Software, 36(8),1-18, 2010

×