View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Integration of GO, Pathway data and Interaction data Chris Mungall Peter D’Eustachio
The GO was originally intended to integrate databases How are we doing? Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address Gene Ontology: Tool for theUniﬁcationofBiology. Nat Genet 2000 SGD FB GOA
GO The GO was originally intended to integrate databases How are we doing? Not as well as we could! GO SGD FB GOA Pathway Commons IMEX Reactome Cyc … BioGRID Intact …
Integration enhances analyses and reduces workload Division of labor leave specialized curation to specialized systems biology databases but data needs to be re-combined to prevent siloing GO is an invaluable single-stop shop for term enrichment etc Can we quantify how integrating with systems biology databases helps users? Yes! We can do the experiment: GO term enrichment analysis on all MolSigDB withReactome annotations Also include Reactome inputs/outputs, not currently in GOA withoutReactomeannotations
Integration enhances analyses GOA+R: Many p-values will significantly improved Recapitulated biologically valid results that would have been suppressed had one single resource been used Examples: Genes down-regulated in Alzheimers
How are we currently integrating systems biology datasets? Interaction data Currently Intact, soon IMEX “protein binding” and “self-protein binding” only (+with) Pathway data Currently Reactomeonly Loses much of what is in Reactome E,g,inputs and outputs Manually curated GO<->Reactome links incomplete not always to the most specific term labor-intensive become stale over time other pathway databases? This can be improved!
Automating integration using cross-product definitions – pathway databases [Term] id: GO:0015871 name: choline transport intersection_of: GO:0006810 ! transport intersection_of:results_in_transport_ofCHEBI:15354 ! choline
Automating integration using cross-products – pathway databases We can also automatically map: catalysis terms [165*] transport  binding  phosphorylation and other modifications metabolism  signaling … All this relies on different cross-product files Any pathway database that exports BioPax-OWL can be used E.ghumancyc, mousecyc, pathwaycommons, … *Numbers for Reactome-human
Automated Integration: Results Reactome Evaluation in progress Many manually assigned equivalencies recapitulated Inferred equivalencies differed in some cases sometimes better than manually assigned sometimes required info not in biopax export ongoing discussions BioGrid not evaluated (all trivial) inferred annotations improve some enrichment results E.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR binding Obvious but useful as proof of concept
Conclusions and future work We can be more efficient: Coordinate with systems bio databases to divide labor Prevent siloing through semi-automated integration GO acts as a high-level ‘window’ on systems biology databases Still to be done: Make integration tool production-ready Reconcile existing mis-alignments, particularly signaling highly inconsistent between GO and Reactome Explore open questions – e.g. auto-generate terms? Finish cross-products, they are vital particular PRO, CHEBI