From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine.

Uploaded on

In the quest to translate the results of life science research into effective clinical applications, many are now turning their attention to and also trying to make sense of the large and rapidly …

In the quest to translate the results of life science research into effective clinical applications, many are now turning their attention to and also trying to make sense of the large and rapidly growing amount of biological and biomedical data. Indeed, getting a grip on and keeping on top of the daily flood of new information, whether it be the latest in clinical reviews, scientific reports, or raw data is an ever-present and widely-recognized challenge. The limited access to structured, integrated and citable data limits our ability to exploit a rich source of scientific knowledge for clinical and translational research. While keeping the dual goals of increasing our understanding of how living systems respond to chemical agents and translating our combined knowledge into clinical applications, I will discuss our efforts to leverage SemanticWeb technologies to facilitate the formulation, publication, integration, and discovery of biological facts, expert knowledge and services of value to pharmaceutical and clinical research, and more recently, with applications for the patient-centric delivery of health care.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • My favorites:
    * Slide 28: OWL - explicit semantics (should be widely reused!)
    * Slide 38: Personal Health Lens 'a patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient's own health data'
    * Slide 39: SADI+SHARE overview slide
    * Slides 40-43: My Health Button, uses SADI+SHARE
    * Slides 45-50: HyQue architecture
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. From biological data to clinical applications: positioning a digital infrastructure for the future of biomedicine Michel Dumontier, Ph.D. Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Professeur Associé, Université Laval Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering1 DERI::Digital Infrastructure for Biomedicine
  • 2. 2 DERI::Digital Infrastructure for Biomedicine
  • 3. 3 DERI::Digital Infrastructure for Biomedicine
  • 4. 4 DERI::Digital Infrastructure for Biomedicine
  • 5. uncovering a sufficient amount of evidence to support/refute a hypothesis is becoming increasingly difficult it requires a lot of digging around5 DERI::Digital Infrastructure for Biomedicine
  • 6. continuous growth in research literature Source: DERI::Digital Infrastructure for Biomedicine
  • 7. access to increasing amounts of biomedical data7 DERI::Digital Infrastructure for Biomedicine
  • 8. access to the most effective software to predict, compare and evaluate8 DERI::Digital Infrastructure for Biomedicine
  • 9. ultimately, we answer questions by building sophisticated workflows9 DERI::Digital Infrastructure for Biomedicine
  • 10. What if we could automatically answer a question using available data and services?10 DERI::Digital Infrastructure for Biomedicine
  • 11. The Semantic Web is the new global web of knowledge It involves standards for publishing, sharing and querying facts, expert knowledge and services It is a scalable approach to the discovery of independently formulated and distributed knowledge11 DERI::Digital Infrastructure for Biomedicine
  • 12. Link all the data!!!12 DERI::Digital Infrastructure for Biomedicine
  • 13. something you can search, lookup, link to, query for and check consistency and veracity of13 DERI::Digital Infrastructure for Biomedicine
  • 14. an emerging linked data network14 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.” DERI::Digital Infrastructure for Biomedicine
  • 15. Life Science Data Contributors • Bio2RDF • Chem2Bio2RDF • LODD (HCLS)15 DERI::Digital Infrastructure for Biomedicine
  • 16. • > 40 biological datasets from independent providers • > 3 billion triples16 DERI::Digital Infrastructure for Biomedicine
  • 17. linked data for the life sciences An Open Source Project for the Provision of Scalable, Decentralized Data with Global Mirroring and Customizable Query Resolution Francois Belleau, Laval University Marc-Alexandre Nolin, Laval University Peter Ansell, Queensland University of Technology Michel Dumontier, Carleton University17 DERI::Digital Infrastructure for Biomedicine
  • 18. Bio2RDF resources are identified using IRIs • Data providers’ record identifiers are maintained from source • E.g.: DrugBank’s resource IRI for Leucovorin DERI::Digital Infrastructure for Biomedicine
  • 19. vocabulary and resource namespaces are used to describe auxiliary resources• Vocabulary namespaces are used for dataset specific types and predicates• Entities arising from n-ary relations are identified in the resource namespace DERI::Digital Infrastructure for 19 Biomedicine
  • 20. 20 DERI::Digital Infrastructure for Biomedicine
  • 21. Every Bio2RDF dataset now contains provenance metadata21 DERI::Digital Infrastructure for Biomedicine
  • 22. Bio2RDF types include biological, information content & processual entities CTD: Chemical, Disease, Chemical-Disease Interaction, Chemical-Gene Interaction Entrez Gene: Gene, Model Organism, Publication HGNC: Accession Number, Gene, Gene Symbol iRefIndex: Protein Complex, Protein Interaction MGI: Gene Marker, Gene Symbol PharmGKB: Association, Disease, Drug, Gene SGD: Enzyme, Pathway, Protein, RNA, Reaction, Location, Experiment22 DERI::Digital Infrastructure for Biomedicine
  • 23. Heterogeneous biological data on the semantic web is difficult to query Question: Find all proteins that interact with beta amyloid (uniprot:P05067) UniProt Protein PDB Protein ? SELECT * WHERE { iRefIndex Protein ?protein a bio2rdf:Protein . ?protein bio2rdf:interacts_with uniprot:P05067 . } Physical interaction? Genetic interaction? Pathway interaction?23 DERI::Digital Infrastructure for Biomedicine
  • 24. Uncertainty in what is being said with a simple triple imagine a statement between two types, C1 and C2 C1 R C2 nucleus part-of cell does it mean For every C1 there is a C2 that is related by R? For every C2 there is a C1 that is related by R? For some C1, there is a C2 that is related by R, or vice versa? Every C1 is a kind of C2? or vice versa? C1s and C2s are the same kind? There is no C1 that is also a C2? we need to commit to a particular meaning that can be universally interpreted – this formalization will then hold across datasets24 DERI::Digital Infrastructure for Biomedicine
  • 25. RDF-based Linked Data is a great first step, but it’s not enough.25 From linked data to linked knowledge through syntactic and semantic normalization. DERI::Digital Infrastructure for Biomedicine
  • 26. ontology as a strategy to formally represent and integrate knowledge26 DERI::Digital Infrastructure for Biomedicine
  • 27. Have you heard of OWL?27 DERI::Digital Infrastructure for Biomedicine
  • 28. The Web Ontology Language (OWL) Has Explicit Semantics Can therefore be used to capture knowledge in a machine understandable way28 DERI::Digital Infrastructure for Biomedicine
  • 29. SIO provides an OWL ontology for the representation of diverse biomedical knowledge29 DERI::Digital Infrastructure for Biomedicine
  • 30. 30 DERI::Digital Infrastructure for Biomedicine
  • 31. Semantic data integration, consistency checking and query answering over Bio2RDF with the Semanticscience Integrated Ontology (SIO) uniprot:P05067 uniprot:P05067 refseq:NP_009225.1 is a is a uniprot:Protein uniprot:Protein refseq:Protein refseq:Protein dataset is a is a is a sio:protein ontology Knowledge BaseQuerying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. to be presented at Bio-ontologies 2012.31 DERI::Digital Infrastructure for Biomedicine
  • 32. Use CTD & SGD to find all chemicals and proteins that participate in the same GO process SELECT * FROM <> WHERE { ?chemical a sio:SIO_010004. # chemical entity ?chemical rdfs:label ?chemicalLabel. ?chemical sio:SIO_000062 ?process. # is participant in ?process rdfs:label ?processLabel. SERVICE <> { ?protein a sio:SIO_010043. # ‘protein’ ?protein sio:SIO_000062 ?process. ?gene sio:SIO_010078 ?protein. # ‘encodes’ ?gene rdfs:label ?geneLabel. } }32 DERI::Digital Infrastructure for Biomedicine
  • 33. More sophisticated OWL-based Data Integration, Consistency Checking and Discovery • Checking the consistency of semantic annotations [1] – Formalized semantic annotations in SBML models as OWL axioms. Automated reasoning uncovered inconsistencies in 16 models. • e.g. alpha-D-glucose phosphate is not the required ATP in an ATP-dependent reaction (GO + ChEBI + disjoint + closure axioms) • Finding significant biomedical associations [2] – found significant associations between genes, drugs, diseases and pathways using Drugbank, PharmGKB, CTD, PID across categories of drugs (ChEBI, ATC, MeSH) and diseases (DO, MeSH) – 22,653 pathway-disease type associations (6304 over; 16,349 under) • carcinosarcoma (DOID:4236) and Zidovudine Pathway (PharmGKB:PA165859361) – 13,826 pathway-chemical type associations (12,564 over; 1262 under) • drug clopidogrel (CHEBI:37941) with Endothelin signaling pathway (PharmGKB:PA164728163); http://pharmgkb-owl.googlecode.com1. Integrating systems biology models and biomedical ontologies. BMC Systems Biology. 2011. 5 : 1242. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. in press33 DERI::Digital Infrastructure for Biomedicine
  • 34. Translational Medicine Requires Integration of Patient and Biomedical Data34 DERI::Digital Infrastructure for Biomedicine
  • 35. Integration of patient record data with Linked Open Data through the Translational Medicine Ontology 223 mappings : 60 TMO classes to 201 target classes from over 40 ontologies and 8 datasets35 DERI::Digital Infrastructure for Biomedicine
  • 36. Formalization of the Dubois AD diagnostic criteria for decision support # the panel is a textual entity dubois:panel2 a iao:IAO_0000300 . dubois:panel2 rdfs:label "Alzheimer Disease diagnostic criteria as reported in panel 2 of dubois et al - pubmed:17616482 [dubois:panel2]". # the panel is about alzheimer disease dubois:panel2 iao:is_about diseasome:74. # the panel is from the article dubois:panel2 ro:part_of <>. # the panel is about diagnostic criterion dubois:panel2 iao:is_about tmo:TMO_0068. #inclusion criterion dubois:10 rdfs:label "Proven AD autosomal dominant mutation within the immediate family [dubois:10]" ; a tmo:TMO_0069; ro:part_of dubois:panel2; iao:is_about diseasome:74. # exclusion criterion dubois:16 rdfs:label "Major depression [dubois:16]" ; a tmo:TMO_0070; ro:part_of dubois:panel2; iao:is_about diseasome:74.36 DERI::Digital Infrastructure for Biomedicine
  • 37. TMKB for pharmaceutical and clinical research, and health care Pharmaceutical Research • Which existing marketed drugs might potentially be re-purposed for AD because they are known to modulate genes that are implicated in the disease? – 57 compounds or classes of compounds that are used to treat 45 diseases, including AD, hyper/hypotension, diabetes and obesity Clinical research • Identify an AD clinical trial for a drug with a different mechanism of action (MOA) than the drug that the patient is currently taking – Of the 438 drugs linked to AD trials, only 58 are in active trials and only 2 (Doxorubicin and IL-2) have a documented MOA. 78 AD-associated drugs have an established MOA. Health care • Have any of my AD patients been treated for other neurological conditions as this might impact their diagnosis? – Patient 2 is also being treated for depression. DERI::Digital Infrastructure for Biomedicine
  • 38. Personal Health Lens Observation: Patients often look up new/alternative drugs to treat their condition or alleviate side effects. Opportunity: A patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient’s own health data Components: • RDFized patient data • Bio2RDF semantically annotated data • SADI semantic web services to process the page and retrieve data • SHARE automatic workflow composition38 DERI::Digital Infrastructure for Biomedicine
  • 39. SADI enables discovery and access to Semantic Web Services The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs ~700 bioinformatic services as of May 29, 2012 Mark Wilkinson, UBC Michel Dumontier, Carleton University Christopher Baker, UNB39 DERI::Digital Infrastructure for Biomedicine
  • 40. 40 DERI::Digital Infrastructure for Biomedicine
  • 41. 41 DERI::Digital Infrastructure for Biomedicine
  • 42. 42 DERI::Digital Infrastructure for Biomedicine
  • 43. The SADI+SHARE workflow and reasoning was personalized to YOUR medical data uses the patient’s data contraindication rationale sources43 DERI::Digital Infrastructure for Biomedicine
  • 44. so how do we get at the supporting evidence?44 DERI::Digital Infrastructure for Biomedicine
  • 45. HyQue HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing – trace a hypothesis to its evaluation, including the data and rules used Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.45 DERI::Digital Infrastructure for Biomedicine
  • 46. HyQue Architecture Ontologies Services46 DERI::Digital Infrastructure for Biomedicine
  • 47. Event-based data model HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the location of this event (e.g. located in nucleus, or under some genetic background) Currently supported events Event 1. protein-protein binding ‘has agent’ agent 2. protein-nucleic acid binding ‘has target’ target 3. molecular activation ‘is located in’ location 4. molecular inhibition 5. gene induction ‘is negated’ boolean 6. gene repression 7. transport47 DERI::Digital Infrastructure for Biomedicine
  • 48. HyQue domain rules CALCULATE a quantitative measure of evidence for an event ‘induce’ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is event of type ‘induce’? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type ‘protein’ or ‘RNA’? • If yes, add 1; if type ‘gene’, subtract 1 – Is target of type ‘gene’? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known ‘transcription factor activity’? • If yes, add 1 GO:0003700 – Is event located in the ‘nucleus’? • If yes, add 1; if no, subtract 1 GO:000563448 DERI::Digital Infrastructure for Biomedicine
  • 49. Combination of system and domain rules to retrieve and score data, and add new triples Event - induction SPIN induction rule :e1 a go:0010628; hyque:agent sgd:Gal4p; hyque:target sgd:GAL1 . hyque:is_negated "0" ;49 DERI::Digital Infrastructure for Biomedicine
  • 50. Customization of rules/data sources will generate different evidence-based evaluations50 DERI::Digital Infrastructure for Biomedicine
  • 51. Reproducible eScience LOD for Hypothesis, Rules, Data and Evaluation51 DERI::Digital Infrastructure for Biomedicine
  • 52. 52 DERI::Digital Infrastructure for Biomedicine
  • 53. A digital infrastructure for the future of biomedicine • Semantic Web technologies offer a powerful integrative platform across facts, expert knowledge and services • The ability to publish, link to, retrieve, check consistency of, query biomedical knowledge will yield an explosion of health-related applications. • By formalizing biomedical data, we can integrate molecular to clinical data, and gain insight into how living systems respond to chemical agents – implications drug discovery & delivery of health care53 DERI::Digital Infrastructure for Biomedicine
  • 54. AcknowledgementsBio2RDF OWL-Based Data IntegrationPeter Ansell, Francois Belleau, Allison Robert Hoehndorf, John Gennari, SarahCallahan, Jacques Corbeil, Jose Cruz- Wimalaratne, Bernard de Bono, Daniel Cook,Toledo, Alex De Leon, Steve Etlinger, and George GkoutosJames Hogan, Nichealla Keath, JeanMorissette, Marc-Alexandre Nolin, NicoleTourigny, Philippe Rigault and Paul Roe SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keath, Artjom Klein, Luke McCarthy, SilvaneHyQue Paixao, Ben Vandervalk, Natalia Villanueva-Alison Callahan Rosales, Mark WilkinsonLab W3C HCLS: J Luciano, B Andersson, CGlen Newton (NLP), Gordana Lenert Batchelor, O Bodenreider, T Clark, C(PGx), Dana Klassen @ DERI, Denney, C Domarew, T Gambet, L Harland,Leonid Chepelev @ UoO, Natalia A Jentzsch, V Kashyap, P Kos, J Kozlovsky,Villanueva-Rosales @ UoTexas, Xueying T Lebo, SM Marshall, JP McCusker, DLChen @ IBM China, Mykola Konyk McGuinness, C Ogbuji, E Pichler, R Powers, E Prud hommeaux, M Samwald, L Schriml, PJ Tonellato, PL Whetzel, J Zhao, S Stephens, C Denney, J Luciano, J McGurk,54 Lynn Schriml, and Peter J. Tonellato. Biomedicine DERI::Digital Infrastructure for
  • 55. Website: Presentations: DERI::Digital Infrastructure for Biomedicine