Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Go pathway-interaction-integration

  • Login to see the comments

  • Be the first to like this

Go pathway-interaction-integration

  1. 1. Integration of GO, Pathway data and Interaction data<br />Chris Mungall<br />Peter D’Eustachio<br />
  2. 2. The GO was originally intended to integrate databases<br />How are we doing?<br />Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address<br />Gene Ontology: Tool for theUnificationofBiology. Nat Genet 2000<br />SGD<br />FB<br />GOA<br />
  3. 3. GO<br />The GO was originally intended to integrate databases<br />How are we doing?<br />Not as well as we could!<br />GO<br />SGD<br />FB<br />GOA<br />Pathway Commons<br />IMEX<br />Reactome<br />Cyc<br />…<br />BioGRID<br />Intact<br />…<br />
  4. 4. Integration enhances analyses and reduces workload<br />Division of labor<br />leave specialized curation to specialized systems biology databases<br />but data needs to be re-combined to prevent siloing<br />GO is an invaluable single-stop shop for term enrichment etc<br />Can we quantify how integrating with systems biology databases helps users?<br />Yes! We can do the experiment:<br />GO term enrichment analysis on all MolSigDB<br />withReactome annotations<br />Also include Reactome inputs/outputs, not currently in GOA<br />withoutReactomeannotations<br />
  5. 5. Integration enhances analyses <br />GOA+R: Many p-values will significantly improved<br />Recapitulated biologically valid results that would have been suppressed had one single resource been used<br />Examples:<br />Genes down-regulated in Alzheimers<br />
  6. 6. How are we currently integrating systems biology datasets?<br />Interaction data<br />Currently Intact, soon IMEX<br />“protein binding” and “self-protein binding” only (+with)<br />Pathway data<br />Currently Reactomeonly<br />Loses much of what is in Reactome<br />E,g,inputs and outputs <br />Manually curated GO<->Reactome links<br />incomplete<br />not always to the most specific term<br />labor-intensive<br />become stale over time<br />other pathway databases?<br />This can be improved!<br />
  7. 7. Automating integration using cross-product definitions – pathway databases<br />[Term]<br />id: GO:0015871<br />name: choline transport<br />intersection_of: GO:0006810 ! transport<br />intersection_of:results_in_transport_ofCHEBI:15354 ! choline<br />
  8. 8. Automating integration using cross-products – pathway databases<br />We can also automatically map:<br />catalysis terms [165*]<br />transport [373]<br />binding [133]<br />phosphorylation and other modifications<br />metabolism [278]<br />signaling<br />…<br />All this relies on different cross-product files<br />Any pathway database that exports BioPax-OWL can be used<br />E.ghumancyc, mousecyc, pathwaycommons, …<br />*Numbers for Reactome-human<br />
  9. 9. Automating integration using cross-products – interaction databases<br />FIGF<br />VEGFR<br />binds<br />has_function<br />is_a<br />[Term]<br />id: GO:0043184<br />name: vascular endothelial growth factor receptor 2 binding<br />intersection_of: GO:0005488 ! binding<br />intersection_of:results_in_binding_ofPRO:000002112! VEGFR 2<br />
  10. 10. Automated Integration: Results<br />Reactome<br />Evaluation in progress<br />Many manually assigned equivalencies recapitulated<br />Inferred equivalencies differed in some cases<br />sometimes better than manually assigned<br />sometimes required info not in biopax export<br />ongoing discussions<br />BioGrid<br />not evaluated (all trivial)<br />inferred annotations improve some enrichment results<br />E.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR binding<br />Obvious but useful as proof of concept<br />
  11. 11. Conclusions and future work<br />We can be more efficient:<br />Coordinate with systems bio databases to divide labor<br />Prevent siloing through semi-automated integration<br />GO acts as a high-level ‘window’ on systems biology databases<br />Still to be done:<br />Make integration tool production-ready<br />Reconcile existing mis-alignments, particularly signaling<br />highly inconsistent between GO and Reactome<br />Explore open questions – e.g. auto-generate terms?<br />Finish cross-products, they are vital<br />particular PRO, CHEBI<br />