Knowledge Discovery using an Integrated Semantic Web


Published on

Biohackathon 2012 keynote

Published in: Technology, Education
1 Comment
  • Check this blog for PPTs, its very popular in these days: www.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Knowledge Discovery using an Integrated Semantic Web

  1. Knowledge Discovery using an Integrated Semantic Web Michel Dumontier Department of Biology, School of Computer Science, Institute of Biochemistry Ottawa Institute for Systems Biology Ottawa-Carleton Institute for Biomedical Engineering Carleton University Ottawa, Canada Chair, W3C Semantic Web for Health Care and Life Sciences Interest Group1 BH2012
  2. 2 BH2012
  3. 3 BH2012
  4. uncovering a sufficient amount of evidence to support/refute a hypothesis is becoming increasingly difficult it requires a lot of digging around4 BH2012
  5. continuous growth in research literature Source: BH2012
  6. growing amount of biomedical data6 BH2012
  7. increasingly complex software & interfaces to predict, compare and evaluate7 BH2012
  8. ultimately, we answer questions by building sophisticated workflows8 BH2012
  9. What if we could just pose a hypothesis and have a system automatically use9 available data, ontologies and services? BH2012
  10. HyQue HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing – trace a hypothesis to its evaluation, including the data and rules used Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.10 BH2012
  11. HyQue Architecture Ontologies Services11 BH2012
  12. Event-based data model HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the location of this event (e.g. located in nucleus, or under some genetic background) supported events Event 1. protein-protein binding ‘has agent’ agent 2. protein-nucleic acid binding ‘has target’ target 3. molecular activation ‘is located in’ location 4. molecular inhibition 5. gene induction ‘is negated’ boolean 6. gene repression 7. transport12 BH2012
  13. HyQue domain rules CALCULATE a quantitative measure of evidence for an event ‘induce’ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is event of type ‘induce’? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type ‘protein’ or ‘RNA’? • If yes, add 1; if type ‘gene’, subtract 1 – Is target of type ‘gene’? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known ‘transcription factor activity’? • If yes, add 1 GO:0003700 – Is event located in the ‘nucleus’? • If yes, add 1; if no, subtract 1 GO:000563413 BH2012
  14. Customization of rules/data sources will generate different evidence-based evaluations14 BH2012
  15. The Semantic Web is the new global web of knowledge It involves standards for publishing, sharing and querying facts, expert knowledge and services It is a scalable approach to the discovery of independently formulated and distributed knowledge15 BH2012
  16. something you can search, lookup, link to, check consistency of, and query for16 BH2012
  17. An ever expanding web of linked data17 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.” BH2012
  18. Bio2RDF provides a simple convention and infrastructure to provide linked data for the life sciences18 BH2012
  19. linked data for the life sciences An Open Source Project for the Provision of Scalable, Decentralized Data with Global Mirroring and Customizable Query Resolution Laval University, Carleton University, Queensland University of Technology19 BH2012
  20. provides billions of interconnections20 BH2012
  21. Towards universally-accepted identifiers21 BH2012
  22.$id22 BH2012
  23. (coming soon)23 BH2012
  24. engaging the BioPAX community to adopt Andrea SplendianiPathwaycommons (level 2; download)<bp:unificationXref rdf:ID="CPATH-LOCAL-653"> <bp:ID rdf:datatype="xsd:string">9606</bp:ID> <bp:DB rdf:datatype="xsd:string">NCBI_TAXONOMY</bp:DB></bp:unificationXref>Pathwaycommons (level 3; web service)<bp:UnificationXref rdf:about="urn:biopax:UnificationXref:REACTOME+DATABASE+ID_109276"> <bp:id rdf:datatype = "">109276</bp:id> <bp:db rdf:datatype = "">Reactome Database ID</bp:db></bp:UnificationXref>Biomodels (level 3)<bp:UnificationXref rdf:about=""> <bp:id rdf:datatype = "">GO:0004889</bp:id> <bp:db rdf:datatype = "">Gene Ontology</bp:db></bp:UnificationXref>24 BH2012
  25. More sophisticated OWL-based Data Integration, Consistency Checking and Discovery Robert Hoehndorf • Checking the consistency of semantic annotations [1] – Formalized semantic annotations in SBML models as OWL axioms. Automated reasoning uncovered inconsistencies in 16 models. • e.g. alpha-D-glucose phosphate is not the required ATP in an ATP-dependent reaction (GO + ChEBI + disjoint + closure axioms) • Finding significant biomedical associations [2] (initiated at BH11) – found significant associations between genes, drugs, diseases and pathways using Drugbank, PharmGKB, CTD, PID across categories of drugs (ChEBI, ATC, MeSH) and diseases (DO, MeSH) – 22,653 pathway-disease type associations (6304 over; 16,349 under) • carcinosarcoma (DOID:4236) and (HIV RT) Zidovudine Pathway (PharmGKB:PA165859361) – 13,826 pathway-chemical type associations (12,564 over; 1262 under) • drug clopidogrel (CHEBI:37941) with Endothelin signaling pathway (PharmGKB:PA164728163) -> (smooth muscle mitogenesis) http://pharmgkb-owl.googlecode.com1. Integrating systems biology models and biomedical ontologies. BMC Systems Biology. 2011. 5 : 1242. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. in press25 BH2012
  26. Personal Health Lens Mark Wilkinson Chris Baker Observation: Patients often look up new/alternative drugs to treat their condition or alleviate side effects. Opportunity: A patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient’s own health data Components: • RDFized patient data • Bio2RDF semantically annotated data • SADI semantic web services to process the page and retrieve data • SHARE automatic workflow composition26 BH2012
  27. 27 BH2012
  28. Matthias Samwald We are developing a simple, cheap and ubiquitous solutions for anchoring pharmacogenomics in medical practice Curated and unified set of essential 385+ markers, 50+ pharmacogenes and rulesystem unified under one standardized model: The Medicine Safety Code W3C Task Force: Clinical Decision Support for Personalized Medicine28 BH2012
  29. Unified OWL Ontology(inferencing ,consistency checking, mapping)29 BH2012
  30. At this Biohackathon • refine Bio2RDF RDFization Guide • complete Dataset Description (BH11) • dataspace statistics & visualization • SPARQL-based Enrichment Analysis • ontology-based Similarity Networks – see Rob’s email30 BH2012
  31. Website: Presentations: BH2012