From biological data to clinical applications:     positioning a digital infrastructure for the               future of bi...
2   DERI::Digital Infrastructure for Biomedicine
3   DERI::Digital Infrastructure for Biomedicine
4   DERI::Digital Infrastructure for Biomedicine
uncovering a sufficient amount of evidence to support/refute             a hypothesis is becoming increasingly difficult  ...
continuous growth in research literature    Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html6                       ...
access to increasing amounts of biomedical data7                                    DERI::Digital Infrastructure for Biome...
access to the most effective software to        predict, compare and evaluate8                                DERI::Digita...
ultimately, we answer questions by building              sophisticated workflows9                                  DERI::D...
What if we could automatically answer a     question using available data and services?10                                 ...
The Semantic Web     is the new global web of knowledge         It involves standards for publishing, sharing and querying...
Link all the       data!!!12                  DERI::Digital Infrastructure for Biomedicine
something you can search,      lookup, link to, query for     and check consistency and            veracity of13          ...
an emerging linked data network14   “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cl...
Life Science Data Contributors     • Bio2RDF     • Chem2Bio2RDF     • LODD (HCLS)15                           DERI::Digita...
• > 40 biological datasets from independent       providers     • > 3 billion triples16                                   ...
linked data for the life sciences       An Open Source Project for the Provision of     Scalable, Decentralized Data with ...
Bio2RDF resources are identified using IRIs     • Data providers’ record identifiers are       maintained from source     ...
vocabulary and resource namespaces are used       to describe auxiliary resources• Vocabulary namespaces are used for data...
20   DERI::Digital Infrastructure for Biomedicine
Every Bio2RDF dataset now contains            provenance metadata21                            DERI::Digital Infrastructur...
Bio2RDF types include biological,       information content & processual entities     CTD: Chemical, Disease, Chemical-Dis...
Heterogeneous biological data on the        semantic web is difficult to query     Question: Find all proteins that intera...
Uncertainty in what is being said              with a simple triple     imagine a statement between two types, C1 and C2  ...
RDF-based Linked Data is a great       first step, but it’s not enough.25       From linked data to linked knowledge throu...
ontology as a         strategy to     formally represent        and integrate         knowledge26           DERI::Digital ...
Have you heard of OWL?27                     DERI::Digital Infrastructure for Biomedicine
The Web Ontology Language          (OWL) Has Explicit Semantics     Can therefore be used to capture knowledge in a       ...
SIO provides an OWL ontology for the     representation of diverse biomedical knowledge29                                 ...
30   DERI::Digital Infrastructure for Biomedicine
Semantic data integration, consistency checking       and query answering over Bio2RDF with the        Semanticscience Int...
Use CTD & SGD to find all chemicals and proteins         that participate in the same GO process     SELECT *     FROM <ht...
More sophisticated OWL-based Data Integration,      Consistency Checking and Discovery  • Checking the consistency of sema...
Translational Medicine Requires Integration           of Patient and Biomedical Data34                                  DE...
Integration of patient record data with Linked Open Data      through the Translational Medicine Ontology        223 mappi...
Formalization of the Dubois          AD diagnostic criteria for             decision support     # the panel is a textual ...
TMKB for pharmaceutical and clinical           research, and health care     Pharmaceutical Research     • Which existing ...
Personal Health Lens     Observation: Patients often look up new/alternative drugs to treat their     condition or allevia...
SADI enables discovery and access         to Semantic Web Services                                                  The Se...
40   DERI::Digital Infrastructure for Biomedicine
41   DERI::Digital Infrastructure for Biomedicine
42   DERI::Digital Infrastructure for Biomedicine
The SADI+SHARE workflow and reasoning      was personalized to YOUR medical data                             uses the pati...
so how do we get at the supporting evidence?44                                DERI::Digital Infrastructure for Biomedicine
HyQue     HyQue is the Hypothesis query and evaluation system     • A platform for knowledge discovery     • Facilitates h...
HyQue Architecture                                      Ontologies                          Services46                    ...
Event-based data model     HyQue events denote a phenomenon involving two     objects: ‘agent’ and ‘target’ . In addition,...
HyQue domain rules CALCULATE a quantitative           measure of evidence for an event     ‘induce’ rule (maximum score: 5...
Combination of system and domain rules to     retrieve and score data, and add new triples     Event - induction         S...
Customization of rules/data sources will generate          different evidence-based evaluations50                         ...
Reproducible eScience LOD for Hypothesis, Rules, Data and Evaluation51                               DERI::Digital Infrast...
52   DERI::Digital Infrastructure for Biomedicine
A digital infrastructure                 for the future of biomedicine     • Semantic Web technologies offer a powerful in...
AcknowledgementsBio2RDF                                    OWL-Based Data IntegrationPeter Ansell, Francois Belleau, Allis...
dumontierlab.com     michel_dumontier@carleton.ca                              Website: http://dumontierlab.com         Pr...
Upcoming SlideShare
Loading in …5
×

From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine.

2,107 views
1,960 views

Published on

In the quest to translate the results of life science research into effective clinical applications, many are now turning their attention to and also trying to make sense of the large and rapidly growing amount of biological and biomedical data. Indeed, getting a grip on and keeping on top of the daily flood of new information, whether it be the latest in clinical reviews, scientific reports, or raw data is an ever-present and widely-recognized challenge. The limited access to structured, integrated and citable data limits our ability to exploit a rich source of scientific knowledge for clinical and translational research. While keeping the dual goals of increasing our understanding of how living systems respond to chemical agents and translating our combined knowledge into clinical applications, I will discuss our efforts to leverage SemanticWeb technologies to facilitate the formulation, publication, integration, and discovery of biological facts, expert knowledge and services of value to pharmaceutical and clinical research, and more recently, with applications for the patient-centric delivery of health care.

Published in: Technology
1 Comment
7 Likes
Statistics
Notes
  • My favorites:
    * Slide 28: OWL - explicit semantics (should be widely reused!)
    * Slide 38: Personal Health Lens 'a patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient's own health data'
    * Slide 39: SADI+SHARE overview slide
    * Slides 40-43: My Health Button, uses SADI+SHARE
    * Slides 45-50: HyQue architecture
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,107
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
40
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine.

  1. 1. From biological data to clinical applications: positioning a digital infrastructure for the future of biomedicine Michel Dumontier, Ph.D. Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Professeur Associé, Université Laval Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering1 DERI::Digital Infrastructure for Biomedicine
  2. 2. 2 DERI::Digital Infrastructure for Biomedicine
  3. 3. 3 DERI::Digital Infrastructure for Biomedicine
  4. 4. 4 DERI::Digital Infrastructure for Biomedicine
  5. 5. uncovering a sufficient amount of evidence to support/refute a hypothesis is becoming increasingly difficult it requires a lot of digging around5 DERI::Digital Infrastructure for Biomedicine
  6. 6. continuous growth in research literature Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html6 DERI::Digital Infrastructure for Biomedicine
  7. 7. access to increasing amounts of biomedical data7 DERI::Digital Infrastructure for Biomedicine
  8. 8. access to the most effective software to predict, compare and evaluate8 DERI::Digital Infrastructure for Biomedicine
  9. 9. ultimately, we answer questions by building sophisticated workflows9 DERI::Digital Infrastructure for Biomedicine
  10. 10. What if we could automatically answer a question using available data and services?10 DERI::Digital Infrastructure for Biomedicine
  11. 11. The Semantic Web is the new global web of knowledge It involves standards for publishing, sharing and querying facts, expert knowledge and services It is a scalable approach to the discovery of independently formulated and distributed knowledge11 DERI::Digital Infrastructure for Biomedicine
  12. 12. Link all the data!!!12 DERI::Digital Infrastructure for Biomedicine
  13. 13. something you can search, lookup, link to, query for and check consistency and veracity of13 DERI::Digital Infrastructure for Biomedicine
  14. 14. an emerging linked data network14 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” DERI::Digital Infrastructure for Biomedicine
  15. 15. Life Science Data Contributors • Bio2RDF • Chem2Bio2RDF • LODD (HCLS)15 DERI::Digital Infrastructure for Biomedicine
  16. 16. • > 40 biological datasets from independent providers • > 3 billion triples16 DERI::Digital Infrastructure for Biomedicine
  17. 17. linked data for the life sciences An Open Source Project for the Provision of Scalable, Decentralized Data with Global Mirroring and Customizable Query Resolution Francois Belleau, Laval University Marc-Alexandre Nolin, Laval University Peter Ansell, Queensland University of Technology Michel Dumontier, Carleton University17 DERI::Digital Infrastructure for Biomedicine
  18. 18. Bio2RDF resources are identified using IRIs • Data providers’ record identifiers are maintained from source http://bio2rdf.org/namespace:identifier • E.g.: DrugBank’s resource IRI for Leucovorin http://bio2rdf.org/drugbank:DB0065018 DERI::Digital Infrastructure for Biomedicine
  19. 19. vocabulary and resource namespaces are used to describe auxiliary resources• Vocabulary namespaces are used for dataset specific types and predicates http://bio2rdf.org/drugbank_vocabulary:Drug• Entities arising from n-ary relations are identified in the resource namespace http://bio2rdf.org/drugbank_resource:DB00440_DB00650 DERI::Digital Infrastructure for 19 Biomedicine
  20. 20. 20 DERI::Digital Infrastructure for Biomedicine
  21. 21. Every Bio2RDF dataset now contains provenance metadata21 DERI::Digital Infrastructure for Biomedicine
  22. 22. Bio2RDF types include biological, information content & processual entities CTD: Chemical, Disease, Chemical-Disease Interaction, Chemical-Gene Interaction Entrez Gene: Gene, Model Organism, Publication HGNC: Accession Number, Gene, Gene Symbol iRefIndex: Protein Complex, Protein Interaction MGI: Gene Marker, Gene Symbol PharmGKB: Association, Disease, Drug, Gene SGD: Enzyme, Pathway, Protein, RNA, Reaction, Location, Experiment22 DERI::Digital Infrastructure for Biomedicine
  23. 23. Heterogeneous biological data on the semantic web is difficult to query Question: Find all proteins that interact with beta amyloid (uniprot:P05067) UniProt Protein PDB Protein ? SELECT * WHERE { iRefIndex Protein ?protein a bio2rdf:Protein . ?protein bio2rdf:interacts_with uniprot:P05067 . } Physical interaction? Genetic interaction? Pathway interaction?23 DERI::Digital Infrastructure for Biomedicine
  24. 24. Uncertainty in what is being said with a simple triple imagine a statement between two types, C1 and C2 C1 R C2 nucleus part-of cell does it mean For every C1 there is a C2 that is related by R? For every C2 there is a C1 that is related by R? For some C1, there is a C2 that is related by R, or vice versa? Every C1 is a kind of C2? or vice versa? C1s and C2s are the same kind? There is no C1 that is also a C2? we need to commit to a particular meaning that can be universally interpreted – this formalization will then hold across datasets24 DERI::Digital Infrastructure for Biomedicine
  25. 25. RDF-based Linked Data is a great first step, but it’s not enough.25 From linked data to linked knowledge through syntactic and semantic normalization. DERI::Digital Infrastructure for Biomedicine
  26. 26. ontology as a strategy to formally represent and integrate knowledge26 DERI::Digital Infrastructure for Biomedicine
  27. 27. Have you heard of OWL?27 DERI::Digital Infrastructure for Biomedicine
  28. 28. The Web Ontology Language (OWL) Has Explicit Semantics Can therefore be used to capture knowledge in a machine understandable way28 DERI::Digital Infrastructure for Biomedicine
  29. 29. SIO provides an OWL ontology for the representation of diverse biomedical knowledge29 DERI::Digital Infrastructure for Biomedicine
  30. 30. 30 DERI::Digital Infrastructure for Biomedicine
  31. 31. Semantic data integration, consistency checking and query answering over Bio2RDF with the Semanticscience Integrated Ontology (SIO) uniprot:P05067 uniprot:P05067 refseq:NP_009225.1 is a is a uniprot:Protein uniprot:Protein refseq:Protein refseq:Protein dataset is a is a is a sio:protein ontology Knowledge BaseQuerying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. to be presented at Bio-ontologies 2012.31 DERI::Digital Infrastructure for Biomedicine
  32. 32. Use CTD & SGD to find all chemicals and proteins that participate in the same GO process SELECT * FROM <http://bio2rdf.org/ctd> WHERE { ?chemical a sio:SIO_010004. # chemical entity ?chemical rdfs:label ?chemicalLabel. ?chemical sio:SIO_000062 ?process. # is participant in ?process rdfs:label ?processLabel. SERVICE <http://sgd.bio2rdf.org/sparql> { ?protein a sio:SIO_010043. # ‘protein’ ?protein sio:SIO_000062 ?process. ?gene sio:SIO_010078 ?protein. # ‘encodes’ ?gene rdfs:label ?geneLabel. } }32 DERI::Digital Infrastructure for Biomedicine
  33. 33. More sophisticated OWL-based Data Integration, Consistency Checking and Discovery • Checking the consistency of semantic annotations [1] – Formalized semantic annotations in SBML models as OWL axioms. Automated reasoning uncovered inconsistencies in 16 models. • e.g. alpha-D-glucose phosphate is not the required ATP in an ATP-dependent reaction (GO + ChEBI + disjoint + closure axioms) • Finding significant biomedical associations [2] – found significant associations between genes, drugs, diseases and pathways using Drugbank, PharmGKB, CTD, PID across categories of drugs (ChEBI, ATC, MeSH) and diseases (DO, MeSH) – 22,653 pathway-disease type associations (6304 over; 16,349 under) • carcinosarcoma (DOID:4236) and Zidovudine Pathway (PharmGKB:PA165859361) – 13,826 pathway-chemical type associations (12,564 over; 1262 under) • drug clopidogrel (CHEBI:37941) with Endothelin signaling pathway (PharmGKB:PA164728163); http://pharmgkb-owl.googlecode.com1. Integrating systems biology models and biomedical ontologies. BMC Systems Biology. 2011. 5 : 1242. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. in press33 DERI::Digital Infrastructure for Biomedicine
  34. 34. Translational Medicine Requires Integration of Patient and Biomedical Data34 DERI::Digital Infrastructure for Biomedicine
  35. 35. Integration of patient record data with Linked Open Data through the Translational Medicine Ontology 223 mappings : 60 TMO classes to 201 target classes from over 40 ontologies and 8 datasets35 DERI::Digital Infrastructure for Biomedicine
  36. 36. Formalization of the Dubois AD diagnostic criteria for decision support # the panel is a textual entity dubois:panel2 a iao:IAO_0000300 . dubois:panel2 rdfs:label "Alzheimer Disease diagnostic criteria as reported in panel 2 of dubois et al - pubmed:17616482 [dubois:panel2]". # the panel is about alzheimer disease dubois:panel2 iao:is_about diseasome:74. # the panel is from the article dubois:panel2 ro:part_of <http://bio2rdf.org/pubmed:17616482>. # the panel is about diagnostic criterion dubois:panel2 iao:is_about tmo:TMO_0068. #inclusion criterion dubois:10 rdfs:label "Proven AD autosomal dominant mutation within the immediate family [dubois:10]" ; a tmo:TMO_0069; ro:part_of dubois:panel2; iao:is_about diseasome:74. # exclusion criterion dubois:16 rdfs:label "Major depression [dubois:16]" ; a tmo:TMO_0070; ro:part_of dubois:panel2; iao:is_about diseasome:74.36 DERI::Digital Infrastructure for Biomedicine
  37. 37. TMKB for pharmaceutical and clinical research, and health care Pharmaceutical Research • Which existing marketed drugs might potentially be re-purposed for AD because they are known to modulate genes that are implicated in the disease? – 57 compounds or classes of compounds that are used to treat 45 diseases, including AD, hyper/hypotension, diabetes and obesity Clinical research • Identify an AD clinical trial for a drug with a different mechanism of action (MOA) than the drug that the patient is currently taking – Of the 438 drugs linked to AD trials, only 58 are in active trials and only 2 (Doxorubicin and IL-2) have a documented MOA. 78 AD-associated drugs have an established MOA. Health care • Have any of my AD patients been treated for other neurological conditions as this might impact their diagnosis? – Patient 2 is also being treated for depression. http://esw.w3.org/topic/HCLSIG/PharmaOntology/Queries37 DERI::Digital Infrastructure for Biomedicine
  38. 38. Personal Health Lens Observation: Patients often look up new/alternative drugs to treat their condition or alleviate side effects. Opportunity: A patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient’s own health data Components: • RDFized patient data • Bio2RDF semantically annotated data • SADI semantic web services to process the page and retrieve data • SHARE automatic workflow composition38 DERI::Digital Infrastructure for Biomedicine
  39. 39. SADI enables discovery and access to Semantic Web Services The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs http://sadiframework.org ~700 bioinformatic services as of May 29, 2012 Mark Wilkinson, UBC Michel Dumontier, Carleton University Christopher Baker, UNB39 DERI::Digital Infrastructure for Biomedicine
  40. 40. 40 DERI::Digital Infrastructure for Biomedicine
  41. 41. 41 DERI::Digital Infrastructure for Biomedicine
  42. 42. 42 DERI::Digital Infrastructure for Biomedicine
  43. 43. The SADI+SHARE workflow and reasoning was personalized to YOUR medical data uses the patient’s data contraindication rationale sources43 DERI::Digital Infrastructure for Biomedicine
  44. 44. so how do we get at the supporting evidence?44 DERI::Digital Infrastructure for Biomedicine
  45. 45. HyQue HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing – trace a hypothesis to its evaluation, including the data and rules used Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.45 DERI::Digital Infrastructure for Biomedicine
  46. 46. HyQue Architecture Ontologies Services46 DERI::Digital Infrastructure for Biomedicine
  47. 47. Event-based data model HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the location of this event (e.g. located in nucleus, or under some genetic background) Currently supported events Event 1. protein-protein binding ‘has agent’ agent 2. protein-nucleic acid binding ‘has target’ target 3. molecular activation ‘is located in’ location 4. molecular inhibition 5. gene induction ‘is negated’ boolean 6. gene repression 7. transport47 DERI::Digital Infrastructure for Biomedicine
  48. 48. HyQue domain rules CALCULATE a quantitative measure of evidence for an event ‘induce’ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is event of type ‘induce’? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type ‘protein’ or ‘RNA’? • If yes, add 1; if type ‘gene’, subtract 1 – Is target of type ‘gene’? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known ‘transcription factor activity’? • If yes, add 1 GO:0003700 – Is event located in the ‘nucleus’? • If yes, add 1; if no, subtract 1 GO:000563448 DERI::Digital Infrastructure for Biomedicine
  49. 49. Combination of system and domain rules to retrieve and score data, and add new triples Event - induction SPIN induction rule :e1 a go:0010628; hyque:agent sgd:Gal4p; hyque:target sgd:GAL1 . hyque:is_negated "0" ;49 DERI::Digital Infrastructure for Biomedicine
  50. 50. Customization of rules/data sources will generate different evidence-based evaluations50 DERI::Digital Infrastructure for Biomedicine
  51. 51. Reproducible eScience LOD for Hypothesis, Rules, Data and Evaluation51 DERI::Digital Infrastructure for Biomedicine
  52. 52. 52 DERI::Digital Infrastructure for Biomedicine
  53. 53. A digital infrastructure for the future of biomedicine • Semantic Web technologies offer a powerful integrative platform across facts, expert knowledge and services • The ability to publish, link to, retrieve, check consistency of, query biomedical knowledge will yield an explosion of health-related applications. • By formalizing biomedical data, we can integrate molecular to clinical data, and gain insight into how living systems respond to chemical agents – implications drug discovery & delivery of health care53 DERI::Digital Infrastructure for Biomedicine
  54. 54. AcknowledgementsBio2RDF OWL-Based Data IntegrationPeter Ansell, Francois Belleau, Allison Robert Hoehndorf, John Gennari, SarahCallahan, Jacques Corbeil, Jose Cruz- Wimalaratne, Bernard de Bono, Daniel Cook,Toledo, Alex De Leon, Steve Etlinger, and George GkoutosJames Hogan, Nichealla Keath, JeanMorissette, Marc-Alexandre Nolin, NicoleTourigny, Philippe Rigault and Paul Roe SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keath, Artjom Klein, Luke McCarthy, SilvaneHyQue Paixao, Ben Vandervalk, Natalia Villanueva-Alison Callahan Rosales, Mark WilkinsonLab W3C HCLS: J Luciano, B Andersson, CGlen Newton (NLP), Gordana Lenert Batchelor, O Bodenreider, T Clark, C(PGx), Dana Klassen @ DERI, Denney, C Domarew, T Gambet, L Harland,Leonid Chepelev @ UoO, Natalia A Jentzsch, V Kashyap, P Kos, J Kozlovsky,Villanueva-Rosales @ UoTexas, Xueying T Lebo, SM Marshall, JP McCusker, DLChen @ IBM China, Mykola Konyk McGuinness, C Ogbuji, E Pichler, R Powers, E Prud hommeaux, M Samwald, L Schriml, PJ Tonellato, PL Whetzel, J Zhao, S Stephens, C Denney, J Luciano, J McGurk,54 Lynn Schriml, and Peter J. Tonellato. Biomedicine DERI::Digital Infrastructure for
  55. 55. dumontierlab.com michel_dumontier@carleton.ca Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier55 DERI::Digital Infrastructure for Biomedicine

×