Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. HCLS Workshop @ ISWC Eric Neumann and Tonya Hongsermeier University of Georgia, Nov 6, 2006
  2. 2. W3C Semantic Web for HealthCare and Life Sciences Interest Group <ul><li>Launched Nov 2005: sw / hcls </li></ul><ul><ul><li>Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode) </li></ul></ul><ul><li>Chartered to develop and support the use of SW technologies and practices to improve collaboration, research and development, and innovation adoption in the of Health Care and Life Science domains </li></ul><ul><li>Based on a foundation of semantically rich specifications that support process and information interoperability </li></ul><ul><li>HCLS Objectives: </li></ul><ul><ul><li>Core vocabularies and ontologies to support cross-community data integration and collaborative efforts </li></ul></ul><ul><ul><li>Guidelines and Best Practices for Resource Identification to support integrity and version control </li></ul></ul><ul><ul><li>Better integration of Scientific Publication with people, data, software, publications, and clinical trials </li></ul></ul>
  3. 3. HCLS Philosophy <ul><li>Share use-cases, applications, demonstrations, experiences </li></ul><ul><li>Expose collections as RDF using public tools </li></ul><ul><li>Develop (where appropriate) core vocabularies for data integration </li></ul>
  4. 4. HCLS Activities <ul><li>BioRDF - data + NLP as RDF </li></ul><ul><li>BioONT - ontology coordination </li></ul><ul><li>Adaptive Clinical Protocols and Pathways </li></ul><ul><li>Drug Safety and Efficacy </li></ul><ul><li>Scientific Publishing - evidence management </li></ul>
  5. 5. Outline <ul><li>Basic Informatics Challenges </li></ul><ul><li>Bench-to-Bedside Applications </li></ul><ul><li>What is the Semantic Web? </li></ul><ul><li>Current Activities… Case Studies </li></ul>
  6. 6. Drug Discovery and Medicine Hygieia, G. Klimt <ul><li>Health </li></ul><ul><li>Practice </li></ul><ul><li>Safety </li></ul><ul><li>Prevention </li></ul><ul><li>Privacy </li></ul><ul><li>Knowledge </li></ul>
  7. 7. Data Expansion <ul><li>Large Data Sets Variables >> Samples </li></ul><ul><li>Many New Data Types </li></ul><ul><ul><li>Which Formats? </li></ul></ul>Combine
  8. 8. Where Information Advances are Most Needed <ul><li>Supporting Innovative Applications in R&D </li></ul><ul><ul><li>Translational Medicine (Biomarkers) </li></ul></ul><ul><ul><li>Molecular Mechanisms (Systems) </li></ul></ul><ul><ul><li>Data Provenance, Rich Annotation </li></ul></ul><ul><li>Clinical Information </li></ul><ul><ul><li>eHealth Records, EDC, Clinical Submission Documents </li></ul></ul><ul><ul><li>Safety Information, Pharmacovigilance, Adverse Events, Biomarker data </li></ul></ul><ul><li>Standards </li></ul><ul><ul><li>Central Data Sources </li></ul></ul><ul><ul><ul><li>Genomics, Diseases, Chemistry, Toxicology </li></ul></ul></ul><ul><ul><li>MetaData </li></ul></ul><ul><ul><ul><li>Ontologies </li></ul></ul></ul><ul><ul><ul><li>Vocabularies </li></ul></ul></ul>
  9. 9. The Big Picture - Hard to understand from just a few Points of View
  10. 11. Complete view tells a very different Story
  11. 12. Distributed Nature of R&D <ul><li>Silos of Data… </li></ul>
  12. 13. Data Integration: Biology Requirements Disease Proteins Genes Papers Retention Policy Audit Trail Curation Tools Ontology Experiment Assays Compounds
  13. 14. New Regulatory Issues Confronting Pharmaceuticals from Innovation or Stagnation , FDA Report March 2004 ADME Optim Tox/Efficacy
  14. 15. Translational Medicine in Drug R&D In Vitro Studies Animal Studies Clinical Studies Toxicities Target/System Efficacy Early Middle Late Cellular Systems Human Disease Models (Therapeutic Relevance) $500K $5M $500M
  15. 16. Translational Research <ul><li>Improve communication between basic and clinical science so that more therapeutic insights may be derived from new scientific ideas - and vice versa. </li></ul><ul><li>Testing of theories emerging from preclinical experimentation on disease-affected human subjects. </li></ul><ul><li>Information obtained from preliminary human experimentation can be used to refine our understanding of the biological principles underpinning the heterogeneity of human disease and polymorphism(s). </li></ul><ul><li> </li></ul><ul><li>Reference NIH Digital Roadmap activity </li></ul>
  16. 17. HCLS Framework: Biomedical Research <ul><li>Molecular, Cellular and Systems Biology/Physiology </li></ul><ul><ul><li>Organism as an integrated an interacting network of genes, proteins and biochemical reactions </li></ul></ul><ul><ul><li>Human body as a system of interacting organs </li></ul></ul><ul><li>Molecular Cell Biology/Genomic and Proteomic Research </li></ul><ul><ul><li>Gene Sequencing, Genotyping, Protein Structures </li></ul></ul><ul><ul><li>Cell Signaling and other Pathways </li></ul></ul><ul><li>Biomarker Research </li></ul><ul><ul><li>Discovery of genes and gene products that can be used to measure disease progression or impacts of drug </li></ul></ul><ul><li>Pharmaco-genomics </li></ul><ul><ul><li>Impact of genetic inheritance on </li></ul></ul><ul><li>Drug Discovery and Translational Research </li></ul><ul><ul><li>Use of preclinical research to identify promising drug candidates </li></ul></ul>
  17. 18. HCLS Framework: Clinical Research <ul><li>Clinical Trials </li></ul><ul><ul><li>Determination of efficacy, impact and safety of drugs for particular diseases </li></ul></ul><ul><li>Pharmaco-vigilance/ADE Surveillance </li></ul><ul><ul><li>Monitoring of impacts of drugs on patients, especially safety and adverse event related information </li></ul></ul><ul><li>Patient Cohort Identification and Management </li></ul><ul><ul><li>Identifying patient cohorts for drug trials is a challenging task </li></ul></ul><ul><li>Translational Research </li></ul><ul><ul><li>Test theories emerging from pre-clinical experimentation on disease affected human subjects </li></ul></ul><ul><li>Development of EHRs/EMRs for both clinical research and practice </li></ul><ul><ul><li>Currently EHRs/EMRs focussed on clinical workflow processes </li></ul></ul><ul><ul><li>Re-using that information for clinical research and trials is a challenging task </li></ul></ul>
  18. 19. Ecosystem: Goal State /* Need to expand this to include Healthcare and Biomedical Research Players as well… Show an integrated picture with “continuous” information flow */ /* Need to expand this with Biomedical Research + Clinical Practice */ Biomedical Research Clinial Practice
  19. 20. Use Case Flow: Drug Discovery and Development K D Qualified Targets Lead Generation Toxicity & Safety Biomarkers Pharmacogenomics Clinical Trials Molecular Mechanisms Lead Optimization
  20. 21. Drug Discovery & Development Knowledge Qualified Targets Lead Generation Toxicity & Safety Biomarkers Pharmacogenomics Clinical Trials Molecular Mechanisms Lead Optimization Launch
  21. 22. What is the Semantic Web ? <ul><li> </li></ul>It’s AI It’s Web 2.0 It’s Ontologies It’s Data Tracking It’s a Global Conspiracy It’s Semantic Webs It’s Text Extraction
  22. 23. The Current Web <ul><li>What the computer sees: “Dumb” links </li></ul><ul><li>No semantics - <a href> treated just like <bold> </li></ul><ul><li>Minimal machine-processable information </li></ul>
  23. 24. The Semantic Web <ul><li>Machine-processable semantic information </li></ul><ul><li>Semantic context published – making the data more informative to both humans and machines </li></ul>
  24. 25. Understanding the Semantic Web <ul><li>Vision </li></ul><ul><ul><li>Some day in the future… </li></ul></ul><ul><ul><li>Today-> describing data </li></ul></ul><ul><li>Core Concept: TRIPLES… </li></ul><ul><li>Specifications </li></ul><ul><ul><li>RDF, OWL, GRDDL- </li></ul></ul><ul><ul><li>Coming soon: SPARQL, RIF </li></ul></ul><ul><li>Applications </li></ul><ul><ul><li>Data Aggregation: Recombinant Data </li></ul></ul><ul><ul><li>Statements: Annotating things </li></ul></ul><ul><li>Practices </li></ul><ul><ul><li>Everything gets a URI… </li></ul></ul><ul><ul><li>New definition of Data Interoperability: </li></ul></ul><ul><ul><ul><li>DTA : Data Transit Authority </li></ul></ul></ul>Subject Object Property <Patient HB2122> <shows_sign> <Disease Pneumococcal_Meningitis>
  25. 26. The Technologies: RDF <ul><li>Resource Description Framework </li></ul><ul><li>W3C standard for making statements of fact or belief about data or concepts </li></ul><ul><li>Descriptive statements are expressed as triples: (Subject, Verb, Object) </li></ul><ul><ul><li>We call verb a “predicate” or a “property” </li></ul></ul>Subject Object Property <Patient HB2122> <shows_sign> <Disease Pneumococcal_Meningitis>
  26. 27. London Underground App View
  27. 28. Application Space : Semantic Web Drug DD Genomics Therapeutics Biology HTS NDA Compound Opt safety eADME DMPK informatics manufacturing genes Clinical Studies Patent Chem Lib Production Critical Path
  28. 29. URI - A key element <ul><li>Uniform Resource Identifier </li></ul><ul><li>Specification used in HTML, XML, and RDF-OWL </li></ul><ul><li>Fundamental to RDF: It IS the only valid SW identifier! </li></ul><ul><li>Two forms: </li></ul><ul><ul><li>HTTP- </li></ul></ul><ul><ul><li>URN- </li></ul></ul><ul><li>Resolution </li></ul><ul><ul><li>Mapping retrievable data to a URI </li></ul></ul><ul><ul><li>Does not mean getting everything known about a URI </li></ul></ul><ul><ul><li>Not clear how to best handle versioning </li></ul></ul><ul><ul><li>See Alan’s slides… </li></ul></ul>
  29. 30. REST-fulness <ul><li>REST is a term coined by Roy Fielding to describe an architecture style of networked systems. REST is an acronym standing for Representational State Transfer. </li></ul><ul><ul><li> (get gene list) </li></ul></ul><ul><ul><li> (get gene info) </li></ul></ul><ul><li>Can REST == URI, and if so, when? </li></ul><ul><ul><li>Yes, if we agree return function is identical to URI resolution </li></ul></ul><ul><li>Issues: </li></ul><ul><ul><li>Should it return RDF always? - standardized </li></ul></ul><ul><ul><li>Resolution is only a subset of services, how do we handle non-resolution services: are these URI’s as well? </li></ul></ul>
  30. 31. Google Graphs Ranking Sites based on Topology Associate Word frequencies with ranked sites
  31. 32. Opportunities for Semantics in HealthCare <ul><li>Enhanced interoperability via: </li></ul><ul><ul><li>Semantic Tagging </li></ul></ul><ul><ul><li>Grounding of concepts in Standardized Vocabularies </li></ul></ul><ul><ul><li>Complex Definitions </li></ul></ul><ul><li>Semantics-based Observation Capture </li></ul><ul><li>Inference on Diseases </li></ul><ul><ul><li>Phenotypes </li></ul></ul><ul><ul><li>Genetics </li></ul></ul><ul><ul><li>Mechanisms </li></ul></ul><ul><li>Semantics-based Clinical Decision Support </li></ul><ul><ul><li>Guided Data Interpretation </li></ul></ul><ul><ul><li>Guided Ordering </li></ul></ul><ul><li>Semantics-based Knowledge Management </li></ul>
  32. 33. Text Unstructured Data Types Structured and Complex Data Types Histology Profiling Data Semantics in the Life Sciences Publications Image + Text Publications + data Text + data items genomics Gene expression Data Items Data Items Clinical Findings Categorical Taxonomic Data Items Pathways, Biomarkers Complex Objects Clinical trials Complex Objects with Categorical/Taxonomic Data Items Systems Biology Composite Objects with Embedded “ process”
  33. 34. DB XML RDF-OWL Mapping from Current Formats
  34. 35. RDB => RDF Virtualized RDF
  36. 37. RDFa: Bridging the Hypertext and Semantic Webs <ul><li><div xmlns:cc=&quot; &quot; xmlns:dc=” ” about=” photo2.jpg ”> </li></ul><ul><li>This photo was taken by </li></ul><ul><li><span property=” dc:creator ”> Ben Adida </span> </li></ul><ul><li>and is licensed under a </li></ul><ul><li><a rel=” cc:license ” </li></ul><ul><li>href=” ”> </li></ul><ul><li>Creative Commons License </li></ul><ul><li></a>. </li></ul><ul><li></div> </li></ul>photo2.jpg Ben Adida licenses/by/2.5/ dc:creator cc:license
  37. 38. Excel => RDF <ul><li>ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2 ; ls:GE_Expected_Ratio &quot;0.2726&quot; ; ls:conditionHub gl:BREAST_MALIGNANT } ; </li></ul><ul><li>ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:TNFRS ; ls:GE_Expected_Ratio &quot;0.0138&quot; ; ls:conditionHub gl:BREAST_MALIGNANT } ; </li></ul><ul><li>ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2 ; ls:GE_Expected_Ratio &quot;0.1275&quot; ; ls:conditionHub gl:BREAST_NORMAL } ; </li></ul>Casp2 TNFRS Breast Malig
  38. 39. Use-Case: Semantics of Multivariate Analysis Column Semantic <rdf:type Gene> Row Semantic <rdf:type Subject> <ul><li>Make the Row and Column types explicit and universal. </li></ul><ul><li>Link Entities to unique web resources </li></ul><ul><li>Include experimental Metadata </li></ul>
  39. 40. Use-Case: COSA Row Semantic <rdf:type Subject> Data Set Column Semantic <rdf:type Gene>
  40. 41. Courtesy of BG-Medicine Example: Knowledge Aggregation
  41. 42. Case Study: Omics <ul><li>ApoA1 … </li></ul><ul><li>… is produced by the Liver </li></ul><ul><li>… is expressed less in Atherosclerotic Liver </li></ul><ul><li>… is correlated with DKK1 </li></ul><ul><li>… is cited regarding Tangier’s disease </li></ul><ul><li>… has Tx Reg elements like HNFR1 </li></ul><ul><li>Subject  Verb  Object </li></ul>
  42. 43. Knowledge Mining using Semantic Web <ul><li>“ Gene Prioritization through Data Fusion” </li></ul><ul><li>Aerts et al, 2006, Nature </li></ul><ul><li>Use of quantitative and qualitative information for statistical ranking. </li></ul><ul><li>Can be used to identify novel genes involved in diseases </li></ul>
  43. 44. Potential Linked Clinical Ontologies Clinical Trials ontology RCRIM (HL7) Genomics CDISC ICD10 Pathways (BioPAX) Disease Models Extant ontologies Under development Bridge concept SNOMED Tox IRB Applications Molecules Clinical Obs Mechanisms Disease Descriptions
  44. 45. Case Study: BioPAX (Pathways) <ul><li><bp:PATHWAYSTEP rdf:ID=&quot;xDshToXGSK3bPathwayStep&quot;> </li></ul><ul><li><bp:next-step rdf:resource=&quot;#xGSK3bToBetaCateninPathwayStep&quot;/> </li></ul><ul><li><bp:step-interactions> </li></ul><ul><ul><li><bp:MODULATION rdf:ID=&quot;xDshToXGSK3b&quot;> </li></ul></ul><ul><ul><li><bp:keft rdf:resource=&quot;#xDsh&quot;/> </li></ul></ul><ul><ul><li><bp:right rdf:resource=&quot;#xGSK-3beta&quot;/> </li></ul></ul><ul><ul><li><bp:participants rdf:resource=&quot;#xGSK-3beta&quot;/> </li></ul></ul><ul><ul><li><bp:name rdf:datatype=&quot;;> Dishevelled to GSK3beta</bp:name> </li></ul></ul><ul><ul><li><bp:direction rdf:datatype=&quot;;> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction > </li></ul></ul><ul><ul><li><bp:control-type rdf:datatype=&quot;;> INHIBITION</bp: control-type > </li></ul></ul><ul><ul><li><bp: participants rdf:resource=&quot;#xDsh&quot;/> </li></ul></ul><ul><li></bp: MODULATION > </li></ul><ul><li></bp: step-interactions > </li></ul><ul><li></bp: PATHWAYSTEP > </li></ul>
  45. 46. Case Study: BioPAX (Pathways) <ul><li><bp:PATHWAYSTEP rdf:ID=&quot;xDshToXGSK3bPathwayStep&quot;> </li></ul><ul><li><bp:next-step rdf:resource=&quot;#xGSK3bToBetaCateninPathwayStep&quot;/> </li></ul><ul><li><bp:step-interactions> </li></ul><ul><ul><li><bp:MODULATION rdf:ID=&quot;xDshToXGSK3b&quot;> </li></ul></ul><ul><ul><li><bp:keft rdf:resource=&quot;#xDsh&quot;/> </li></ul></ul><ul><ul><li><bp:right rdf:resource=&quot;#xGSK-3beta&quot;/> </li></ul></ul><ul><ul><li><bp:participants rdf:resource=&quot;#xGSK-3beta&quot;/> </li></ul></ul><ul><ul><li><bp:name rdf:datatype=&quot;;> Dishevelled to GSK3beta</bp:name> </li></ul></ul><ul><ul><li><bp:direction rdf:datatype=&quot;;> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction > </li></ul></ul><ul><ul><li><bp:control-type rdf:datatype=&quot;;> INHIBITION</bp: control-type > </li></ul></ul><ul><ul><li><drug:affectedBy rdf:resource=”;/> </li></ul></ul><ul><ul><li><bp: participants rdf:resource=&quot;#xDsh&quot;/> </li></ul></ul><ul><li></bp: MODULATION > </li></ul><ul><li></bp: step-interactions > </li></ul><ul><li></bp: PATHWAYSTEP > </li></ul>Modulation CHIR99102 affectedBy
  46. 47. Case Study: Drug Discovery Dashboards <ul><li>Dashboards and Project Reports </li></ul><ul><li>Next generation browsers for semantic information via Semantic Lenses </li></ul><ul><li>Renders OWL-RDF, XML, and HTML documents </li></ul><ul><li>Lenses act as information aggregators and logic style-sheets </li></ul><ul><ul><ul><li>add { ls:TheraTopic </li></ul></ul></ul><ul><ul><ul><li>hs:classView:TopicView </li></ul></ul></ul><ul><ul><ul><li>} </li></ul></ul></ul>
  47. 48. Drug Discovery Dashboard Topic: GSK3beta Topic Target: GSK3beta Disease: DiabetesT2 Alt Dis: Alzheimers Cmpd: SB44121 CE: DBP Team: GSK3 Team Person: John Related Set Path: WNT
  48. 49. Bridging Chemistry and Molecular Biology P49841 Semantic Lenses: Different Views of the same data Apply Correspondence Rule: if ?target.xref.lsid == ?bpx:prot.xref.lsid then ?target.correspondsTo.?bpx:prot BioPax Components Target Model
  49. 50. Bridging Chemistry and Molecular Biology <ul><li>Lenses can aggregate, accentuate, or even analyze new result sets </li></ul><ul><li>Behind the lens, the data can be persistently stored as RDF-OWL </li></ul><ul><li>Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references </li></ul>
  50. 51. Pathway Polymorphisms <ul><li>Merge directly onto pathway graph </li></ul><ul><li>Identify targets with lowest chance of genetic variance </li></ul><ul><li>Predict parts of pathways with highest functional variability </li></ul><ul><li>Map genetic influence to potential pathway elements </li></ul><ul><li>Select mechanisms of action that are minimally impacted by polymorphisms </li></ul>Non-synonymous polymorphisms from db-SNP
  51. 52. BioRDF Neuro Tasks <ul><li>Aggregate facts and models around Parkinson’s Disease </li></ul><ul><li>BIRN / Human Brain Project </li></ul><ul><li>SWAN : scientific annotations and evidence </li></ul><ul><li>NeuroCommons </li></ul><ul><li>Use RDF and OWL to describe </li></ul><ul><ul><li>’ Brain Connectivity' </li></ul></ul><ul><ul><li>Neuronal data in SenseLab </li></ul></ul>
  52. 53. BioRDF : <ul><li>The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: </li></ul><ul><li>To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. </li></ul><ul><li>To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. </li></ul><ul><li>To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner. </li></ul>
  53. 54. BioRDF: Reagents <ul><li>RDF resources that describes various kinds of experimental reagents, starting with antibodies: </li></ul><ul><li>Initial RDF that captures: Gene, the fact that this is an antibody, various kinds of pages about the antibody, such as vendor documentation, and any other properties that are explicitly captured in the source material </li></ul><ul><li>Work with the Ontology task force to identify appropriate ontologies and vocabularies to use in the RDF. </li></ul><ul><li>Write queries against the RDF to answer questions of the sort posed on the Alzforum's </li></ul>
  54. 55. BioRDF: NCBI <ul><li>NCBI Data: URIs and as RDF (Olivier Bodensreider) </li></ul><ul><li>Terminology Integration: NLM’s UMLS, MESH </li></ul><ul><ul><li>SNOMED… </li></ul></ul>
  55. 56. Conclusions: Key Semantic Web Principles <ul><li>Plan for change </li></ul><ul><li>Free data from the application that created it </li></ul><ul><li>Lower reliance on overly complex Middleware </li></ul><ul><li>The value in &quot;as needed&quot; data integration </li></ul><ul><li>Big wins come from many little ones </li></ul><ul><li>The power of links - network effect </li></ul><ul><li>Open-world, open solutions are cost effective </li></ul><ul><li>Importance of &quot;Partial Understanding&quot; </li></ul>