Go pathway-interaction-integration

  • 913 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • the arrows appear to have shifted on uploading to slideshare..

    the basic idea here is that given nothing more than the statement 'FIGF binds VEGFR2' (which could come from biogrid/IMEX, or from a generic annotation to GO 'protein binding' with VEGFR2 in c8/16) we can automatically infer a 'VEGFR2 binding' annotation for FIGF. This relies on links between VEGFR and PRO, and bp_xp_protein
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
913
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
1
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The Gene Ontology was created as a response to the need to address the need for interoperability in genomic databases in the wake of the sequencing of the first metazoan genomes. In the paper Gene Ontology: tool for the unification of biology published nearly ten years ago, Ashburner et al state: Progress in the way that biologists describe and conceptualize the shared biological elements has not kept pace with sequencing . . . Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address [25].The GO has since become the de-facto terminological standard for functional annotation, and its success is evident in the popularity of GO-based class enrichment analyses. However, the intervening ten years have witnessed an explosion of interest in systems biology, with a concomitant increase in the number of databases providing information on interactions and pathways, including Reactome, Nature Signaling, PANTHER [26], BIND, BioGRID and HumanCyc (the EcoCyc metabolic pathway database preceded GO [27]). These databases each have their own individual data models and schemas, creating an interoperability problem. This has partly been mitigated by the adoption of BioPAX as a standard exchange format, which allows the aggregation of multiple pathway databases in single “one-stop shopping” warehouses, such as the Pathway Knowledge Base [28], Pathway Commons, and WikiPathways. However, the data is still only partially integrated, and if a researcher wishes to obtain a comprehensive view of a pathway they must still examine multiple records, in addition to GO annotations
  • loss in pathway db transition
  • their data models capture more.
  • also via col16
  • PRO does not yet have Ras etc

Transcript

  • 1. Integration of GO, Pathway data and Interaction data
    Chris Mungall
    Peter D’Eustachio
  • 2. The GO was originally intended to integrate databases
    How are we doing?
    Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address
    Gene Ontology: Tool for theUnificationofBiology. Nat Genet 2000
    SGD
    FB
    GOA
  • 3. GO
    The GO was originally intended to integrate databases
    How are we doing?
    Not as well as we could!
    GO
    SGD
    FB
    GOA
    Pathway Commons
    IMEX
    Reactome
    Cyc

    BioGRID
    Intact

  • 4. Integration enhances analyses and reduces workload
    Division of labor
    leave specialized curation to specialized systems biology databases
    but data needs to be re-combined to prevent siloing
    GO is an invaluable single-stop shop for term enrichment etc
    Can we quantify how integrating with systems biology databases helps users?
    Yes! We can do the experiment:
    GO term enrichment analysis on all MolSigDB
    withReactome annotations
    Also include Reactome inputs/outputs, not currently in GOA
    withoutReactomeannotations
  • 5. Integration enhances analyses
    GOA+R: Many p-values will significantly improved
    Recapitulated biologically valid results that would have been suppressed had one single resource been used
    Examples:
    Genes down-regulated in Alzheimers
  • 6. How are we currently integrating systems biology datasets?
    Interaction data
    Currently Intact, soon IMEX
    “protein binding” and “self-protein binding” only (+with)
    Pathway data
    Currently Reactomeonly
    Loses much of what is in Reactome
    E,g,inputs and outputs
    Manually curated GO<->Reactome links
    incomplete
    not always to the most specific term
    labor-intensive
    become stale over time
    other pathway databases?
    This can be improved!
  • 7. Automating integration using cross-product definitions – pathway databases
    [Term]
    id: GO:0015871
    name: choline transport
    intersection_of: GO:0006810 ! transport
    intersection_of:results_in_transport_ofCHEBI:15354 ! choline
  • 8. Automating integration using cross-products – pathway databases
    We can also automatically map:
    catalysis terms [165*]
    transport [373]
    binding [133]
    phosphorylation and other modifications
    metabolism [278]
    signaling

    All this relies on different cross-product files
    Any pathway database that exports BioPax-OWL can be used
    E.ghumancyc, mousecyc, pathwaycommons, …
    *Numbers for Reactome-human
  • 9. Automating integration using cross-products – interaction databases
    FIGF
    VEGFR
    binds
    has_function
    is_a
    [Term]
    id: GO:0043184
    name: vascular endothelial growth factor receptor 2 binding
    intersection_of: GO:0005488 ! binding
    intersection_of:results_in_binding_ofPRO:000002112! VEGFR 2
  • 10. Automated Integration: Results
    Reactome
    Evaluation in progress
    Many manually assigned equivalencies recapitulated
    Inferred equivalencies differed in some cases
    sometimes better than manually assigned
    sometimes required info not in biopax export
    ongoing discussions
    BioGrid
    not evaluated (all trivial)
    inferred annotations improve some enrichment results
    E.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR binding
    Obvious but useful as proof of concept
  • 11. Conclusions and future work
    We can be more efficient:
    Coordinate with systems bio databases to divide labor
    Prevent siloing through semi-automated integration
    GO acts as a high-level ‘window’ on systems biology databases
    Still to be done:
    Make integration tool production-ready
    Reconcile existing mis-alignments, particularly signaling
    highly inconsistent between GO and Reactome
    Explore open questions – e.g. auto-generate terms?
    Finish cross-products, they are vital
    particular PRO, CHEBI