Integration of GO, Pathway data and Interaction data<br />Chris Mungall<br />Peter D’Eustachio<br />
The GO was originally intended to integrate databases<br />How are we doing?<br />Interoperability of genomic databases is...
GO<br />The GO was originally intended to integrate databases<br />How are we doing?<br />Not as well as we could!<br />GO...
Integration enhances analyses and reduces workload<br />Division of labor<br />leave specialized curation to specialized s...
Integration enhances analyses <br />GOA+R: Many p-values will significantly improved<br />Recapitulated biologically valid...
How are we currently integrating systems biology datasets?<br />Interaction data<br />Currently Intact, soon IMEX<br />“pr...
Automating integration using cross-product definitions – pathway databases<br />[Term]<br />id: GO:0015871<br />name: chol...
Automating integration using cross-products – pathway databases<br />We can also automatically map:<br />catalysis terms [...
Automating integration using cross-products – interaction databases<br />FIGF<br />VEGFR<br />binds<br />has_function<br /...
Automated Integration: Results<br />Reactome<br />Evaluation in progress<br />Many manually assigned equivalencies recapit...
Conclusions and future work<br />We can be more efficient:<br />Coordinate with systems bio databases to divide labor<br /...
Upcoming SlideShare
Loading in …5
×

Go pathway-interaction-integration

1,193 views

Published on

1 Comment
0 Likes
Statistics
Notes
  • the arrows appear to have shifted on uploading to slideshare..

    the basic idea here is that given nothing more than the statement 'FIGF binds VEGFR2' (which could come from biogrid/IMEX, or from a generic annotation to GO 'protein binding' with VEGFR2 in c8/16) we can automatically infer a 'VEGFR2 binding' annotation for FIGF. This relies on links between VEGFR and PRO, and bp_xp_protein
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
1,193
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • The Gene Ontology was created as a response to the need to address the need for interoperability in genomic databases in the wake of the sequencing of the first metazoan genomes. In the paper Gene Ontology: tool for the unification of biology published nearly ten years ago, Ashburner et al state: Progress in the way that biologists describe and conceptualize the shared biological elements has not kept pace with sequencing . . . Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address [25].The GO has since become the de-facto terminological standard for functional annotation, and its success is evident in the popularity of GO-based class enrichment analyses. However, the intervening ten years have witnessed an explosion of interest in systems biology, with a concomitant increase in the number of databases providing information on interactions and pathways, including Reactome, Nature Signaling, PANTHER [26], BIND, BioGRID and HumanCyc (the EcoCyc metabolic pathway database preceded GO [27]). These databases each have their own individual data models and schemas, creating an interoperability problem. This has partly been mitigated by the adoption of BioPAX as a standard exchange format, which allows the aggregation of multiple pathway databases in single “one-stop shopping” warehouses, such as the Pathway Knowledge Base [28], Pathway Commons, and WikiPathways. However, the data is still only partially integrated, and if a researcher wishes to obtain a comprehensive view of a pathway they must still examine multiple records, in addition to GO annotations
  • loss in pathway db transition
  • their data models capture more.
  • also via col16
  • PRO does not yet have Ras etc
  • Go pathway-interaction-integration

    1. 1. Integration of GO, Pathway data and Interaction data<br />Chris Mungall<br />Peter D’Eustachio<br />
    2. 2. The GO was originally intended to integrate databases<br />How are we doing?<br />Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address<br />Gene Ontology: Tool for theUnificationofBiology. Nat Genet 2000<br />SGD<br />FB<br />GOA<br />
    3. 3. GO<br />The GO was originally intended to integrate databases<br />How are we doing?<br />Not as well as we could!<br />GO<br />SGD<br />FB<br />GOA<br />Pathway Commons<br />IMEX<br />Reactome<br />Cyc<br />…<br />BioGRID<br />Intact<br />…<br />
    4. 4. Integration enhances analyses and reduces workload<br />Division of labor<br />leave specialized curation to specialized systems biology databases<br />but data needs to be re-combined to prevent siloing<br />GO is an invaluable single-stop shop for term enrichment etc<br />Can we quantify how integrating with systems biology databases helps users?<br />Yes! We can do the experiment:<br />GO term enrichment analysis on all MolSigDB<br />withReactome annotations<br />Also include Reactome inputs/outputs, not currently in GOA<br />withoutReactomeannotations<br />
    5. 5. Integration enhances analyses <br />GOA+R: Many p-values will significantly improved<br />Recapitulated biologically valid results that would have been suppressed had one single resource been used<br />Examples:<br />Genes down-regulated in Alzheimers<br />
    6. 6. How are we currently integrating systems biology datasets?<br />Interaction data<br />Currently Intact, soon IMEX<br />“protein binding” and “self-protein binding” only (+with)<br />Pathway data<br />Currently Reactomeonly<br />Loses much of what is in Reactome<br />E,g,inputs and outputs <br />Manually curated GO<->Reactome links<br />incomplete<br />not always to the most specific term<br />labor-intensive<br />become stale over time<br />other pathway databases?<br />This can be improved!<br />
    7. 7. Automating integration using cross-product definitions – pathway databases<br />[Term]<br />id: GO:0015871<br />name: choline transport<br />intersection_of: GO:0006810 ! transport<br />intersection_of:results_in_transport_ofCHEBI:15354 ! choline<br />
    8. 8. Automating integration using cross-products – pathway databases<br />We can also automatically map:<br />catalysis terms [165*]<br />transport [373]<br />binding [133]<br />phosphorylation and other modifications<br />metabolism [278]<br />signaling<br />…<br />All this relies on different cross-product files<br />Any pathway database that exports BioPax-OWL can be used<br />E.ghumancyc, mousecyc, pathwaycommons, …<br />*Numbers for Reactome-human<br />
    9. 9. Automating integration using cross-products – interaction databases<br />FIGF<br />VEGFR<br />binds<br />has_function<br />is_a<br />[Term]<br />id: GO:0043184<br />name: vascular endothelial growth factor receptor 2 binding<br />intersection_of: GO:0005488 ! binding<br />intersection_of:results_in_binding_ofPRO:000002112! VEGFR 2<br />
    10. 10. Automated Integration: Results<br />Reactome<br />Evaluation in progress<br />Many manually assigned equivalencies recapitulated<br />Inferred equivalencies differed in some cases<br />sometimes better than manually assigned<br />sometimes required info not in biopax export<br />ongoing discussions<br />BioGrid<br />not evaluated (all trivial)<br />inferred annotations improve some enrichment results<br />E.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR binding<br />Obvious but useful as proof of concept<br />
    11. 11. Conclusions and future work<br />We can be more efficient:<br />Coordinate with systems bio databases to divide labor<br />Prevent siloing through semi-automated integration<br />GO acts as a high-level ‘window’ on systems biology databases<br />Still to be done:<br />Make integration tool production-ready<br />Reconcile existing mis-alignments, particularly signaling<br />highly inconsistent between GO and Reactome<br />Explore open questions – e.g. auto-generate terms?<br />Finish cross-products, they are vital<br />particular PRO, CHEBI<br />

    ×