Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation to the J. Craig Venter Institute, Dec. 2014

600 views

Published on

This is largely a compilation of various other talks that I have posted here - a summary of the past 3+ years of work on SADI/SHARE. It includes the (now well-worn!!) slides about SHARE, as well as some of the more contemporary stuff about how we extended GALEN clinical classes with richer semantic descriptions, and then used them to do automated clinical phenotype analysis. Also includes the slide-deck related to automated Measurement Unit conversion (related to our work on semantically representing Framingham clinical risk assessment rules)

So... for anyone who regularly follows my uploads, there isn't much "new" in here, but at least it's all in one place now! :-)

Published in: Internet
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Presentation to the J. Craig Venter Institute, Dec. 2014

  1. 1. “Shopping for data should be as easy as shopping for shoes!” Dr. Carole Goble Professor, Dept. of Computer Science University of Manchester
  2. 2. “A little bit of semantics goes a long way” Dr. James Hendler Artificial Intelligence Researcher Rensselaer Polytechnic Institute One of the originators of the Semantic Web
  3. 3. …but a lot of semantics goes a long, long way! Mark Wilkinson Isaac Peral Distinguished Researcher Director, Fundación BBVA Chair in Biological Informatics Center for Plant Biotechnology and Genomics Technical University of Madrid
  4. 4. Making the Web a biomedical research platform from hypothesis through to publication
  5. 5. Publication Discourse Interpretation Hypothesis Experiment
  6. 6. Publication Discourse Interpretation Hypothesis Experiment
  7. 7. Motivation: 3 intersecting trends in the Life Sciences that are now, or soon will be, extremely problematic
  8. 8. TREND #1 NON-REPRODUCIBLE SCIENCE & THE FAILURE OF PEER REVIEW
  9. 9. Trend #1 Multiple recent surveys of high-throughput biology reveal that upwards of 50% of published studies are not reproducible - Baggerly, 2009 - Ioannidis, 2009
  10. 10. Trend #1 Similar (if not worse!) in clinical studies - Begley & Ellis, Nature, 2012 - Booth, Forbes, 2012 - Huang & Gottardo, Briefings in Bioinformatics, 2012
  11. 11. Trend #1 “the most common errors are simple, the most simple errors are common” At least partially because the analytical methodology was inappropriate and/or not sufficiently described - Baggerly, 2009
  12. 12. Trend #1 These errors pass peer review The researcher is (sometimes) unaware of the error The process that led to the error is not recorded Therefore it cannot be detected during peer-review
  13. 13. Agencies have Noticed! In March, 2012, the US Institute of Medicine ~said “Enough is enough!”
  14. 14. Agencies have Noticed! Institute of Medicine Recommendations For Conduct of High-Throughput Research: 1. Rigorously-described, -annotated, and -followed data management and manipulation procedures 2. “Lock down” the computational analysis pipeline once it Evolution of Translational Omics Lessons Learned and the Path Forward. The Institute of Medicine of the National Academies, Report Brief, March 2012. has been selected 3. Publish the analytical workflow in a formal manner, together with the full starting and result datasets
  15. 15. TREND #2 BIGGER, CHEAPER DATA
  16. 16. Trend #2 High-throughput technologies are becoming cheaper and easier to use
  17. 17. Trend #2 High-throughput technologies are becoming cheaper and easier to use But there are still very few experts trained in statistical analysis of high-throughput data
  18. 18. Trend #2 The number of job postings for data scientist positions increased by 15,000% between the summers of 2011 and 2012 -- Indeed.com job trends data reported by http://blogs.nature.com/naturejobs/2013/03/18/so-you-want-to-be-a-data-scientist
  19. 19. Trend #2 Therefore Even small, moderately-funded laboratories can now afford to produce more data than they can manage or interpret
  20. 20. Trend #2 Therefore Even small, moderately-funded laboratories can now afford to produce more data than they can manage or interpret These labs will likely never be able to afford a qualified data scientist
  21. 21. TREND #3 “THE SINGULARITY”
  22. 22. The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012. Trend #3
  23. 23. “The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
  24. 24. You Are Here Scientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...
  25. 25. ...in a form that immediately (actively!) affected the results of other researchers... You Are Here
  26. 26. ...without requiring them to be aware of these new discoveries. You Are Here
  27. 27. 3 intersecting and problematic trends Non-reproducible science that passes peer-review Cheaper production of larger and more complex datasets that require specialized expertise to analyze properly Need to more rapidly disseminate and use new discoveries
  28. 28. We Want More!
  29. 29. I don’t just want to reproduce your experiment...
  30. 30. I want to re-use your experiment
  31. 31. In my own laboratory... On MY DATA!
  32. 32. When I do my analysis I want to draw on the knowledge of global domain-experts like statisticians and pathologists... ...as if they were mentors sitting in the chair beside me.
  33. 33. Please don’t make me find all of the data and knowledge that I require to do my experiment ...it simply isn’t possible anymore... Image from: Mark Smiciklas Intersection Consulting, cc-nca
  34. 34. Image from AJ Cann cc-by-a license I want to support peer review(ers) so that I do better science.
  35. 35. How do we get there from here?
  36. 36. To overcome these intersecting problems and to achieve the goals of transparent reproducible research
  37. 37. We must learn how to do research IN the Web Not OVER the Web
  38. 38. How we use The Web today
  39. 39. The Web is not a pigeon!
  40. 40. Semantic Web Technologies
  41. 41. The Web
  42. 42. The Semantic Web causally related to
  43. 43. This is the critical bit! The link is explicitly labeled! causally related to ???
  44. 44. http://semanticscience.org/resource/SIO_000243 SIO_000243: <owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relation in which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty> causally related with
  45. 45. http://semanticscience.org/resource/SIO_000243 SIO_000243: <owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relation in which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty> causally related with
  46. 46. Semantic Web Technologies “deep semantics”
  47. 47. Deep Semantics?
  48. 48. Ontology Spectrum Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  49. 49. Ontology Spectrum Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Most biomedical ontologies e.g. Gene Ontology Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  50. 50. Ontology Spectrum Catalog/ ID Ontologies being used in today’s talk Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Most biomedical ontologies e.g. Gene Ontology Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  51. 51. Ontology Spectrum Catalog/ ID Discovery & Interpretation systems – flexible! Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Categorization Systems Like library shelves, inflexible Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  52. 52. Remember, this is the critical bit! causally related with http://semanticscience.org/resource/SIO_000243 It’s relationships that make the Semantic Web “Semantic”
  53. 53. Semantic Web Technologies “deep semantics”
  54. 54. Even with “deep semantics” a lot of important information cannot be represented on the Semantic Web For example, all of the data that results from analytical algorithms and statistical analyses
  55. 55. Varying estimates put the size of the Deep Web between 500 and 800 times larger than the surface Web
  56. 56. On the WWW “automation” of access to Deep Web data happens through “Web Services”
  57. 57. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS)
  58. 58. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result
  59. 59. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result None, so far, has proven to be wildly successful (in my opinion)
  60. 60. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result None, so far, has proven to be wildly successful (in my opinion) …because describing what a Service does is HARD!
  61. 61. Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  62. 62. Scientific Web Services are DIFFERENT! Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  63. 63. “The service interfaces within bioinformatics are relatively simple. An extensible or constrained interoperability framework is likely to suffice for current demands: a fully generic framework is currently not necessary.” Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  64. 64. Scientific Web Services are DIFFERENT! They’re simpler! So perhaps we can solve the Semantic Web Service problem as it pertains to this (important!) domain
  65. 65. With respect to the Semantic Web What is missing from this list? Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result
  66. 66. causally related with http://semanticscience.org/resource/SIO_000243
  67. 67. causally related with http://semanticscience.org/resource/SIO_000243 The Semantic Web gets its semantics from relationships
  68. 68. causally related with http://semanticscience.org/resource/SIO_000243 The Semantic Web gets its semantics from relationships In 2008 I published a set of design-patterns for scientific Semantic Web Services that focuses on the biological relationship that the Service “exposes”
  69. 69. Design Pattern for Web Services on the Semantic Web
  70. 70. AACTCTTCGTAGTG... Web Service BLAST
  71. 71. AACTCTTCGTAGTG... SADI BLAST has_seq_string has homology to Terminal Flower type gene species A. thal. has_seq_string sequence SADI requires you to explicitly declare as part of your analytical output, the biological relationship that your algorithm “exposed”. AACTCTTCGTAGTG... sequence
  72. 72. I want to share several stories that demonstrate the cool things that happen when you use SADI + deep semantics
  73. 73. Story #1: SHARE The Semantic Health and Research Environment
  74. 74. A proof-of-concept workflow orchestrator + SADI Semantic Web Service registry Objective: answer biologists’ questions
  75. 75. The SHARE registry indexes all of the input/output/relationship triples that can be generated by all known services This is how SHARE discovers services
  76. 76. SHARE demonstrations with increasing semantic complexity
  77. 77. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc }
  78. 78. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } The query language here is SPARQL The W3C-approved, standard query language for the Semantic Web
  79. 79. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } Note that there is no “FROM” clause! We don’t tell it where it should get the information, The machine has to figure that out by itself...
  80. 80. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } Starting data: the locus “DEF” (Deficiens)
  81. 81. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } Query: A series of relationships v.v. DEF
  82. 82. Enter that query into SHARE
  83. 83. Click “Submit”...
  84. 84. ...and in a few seconds you get your answer. Based on the relationships in your query, SHARE queried its registry to automatically discover SADI Services capable of generating those triples
  85. 85. Because it is the Semantic Web The query results are live hyperlinks to the respective Database or images (The answer is IN the Web!)
  86. 86. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  87. 87. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  88. 88. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . } Note again that there is no “From” clause… I have not told SHARE where to look for the answer, I am simply asking my question
  89. 89. Enter that query into SHARE
  90. 90. Two different providers of gene information (KEGG & NCBI); were found & accessed Two different providers of pathway information (KEGG and GO); were found & accessed
  91. 91. The results are all links to the original data (The answer is IN the Web!)
  92. 92. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  93. 93. Show me the latest Blood Urea Nitrogen (BUN) and Creatinine levels of patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  94. 94. Likely Rejecter: A patient who has creatinine levels that are increasing over time - - Mark D Wilkinson’s definition
  95. 95. Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurements at various time-points
  96. 96. Likely Rejecter: So the data required to answer this question DOESN’T EXIST!
  97. 97. My definition of a Likely Rejecter is encoded in a machine-readable document written in the OWL Ontology language Basically: “the regression line over creatinine measurements should have an increasing slope”
  98. 98. Our ontology refers to other ontologies (possibly published by other people) to learn about what the properties of “regression models” are e.g. that regression models have slopes and intercepts and that slopes and intercepts have decimal values
  99. 99. ?
  100. 100. Enter that query into SHARE
  101. 101. SHARE examines the query Burrows around the Web reading the various ontologies then uses the discovered Class definitions as a template to map a path from what it has, to what it needs, using SADI services
  102. 102. Based on the Class definition SHARE decides that it needs to do a Linear Regression analysis on the blood creatinine measurements
  103. 103. ?
  104. 104. The conversation between SHARE and the registry reveals the use of “Deep Semantics” Q: Is there a SADI service that will consume instances of Patient and give me instances of LikelyRejector A: No Q: Okay... So LikelyRejectors need a regression model of increasing slope over their BloodCreatinine, so... Is there a SADI service that will consume BloodCreatinine over time and give me its linear regression model? A: No Q: Okay... Blood Creatinine over time is a subclass of data of type X/Y coordinate, so is there a service that consumes X/Y data and returns its regression model? A: Yes  here’s the URL.
  105. 105. The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis and sends the data to be analyzed
  106. 106. This happens iteratively (e.g. SHARE also has to examine the slope of the regression line using another service, find the “latest” in a series of time measurements, etc.) There is reasoning after every Service invocation (i.e. after every clause in the query) Once it is able to find instances (OWL Individuals) of the LikelyRejector class, it continues with the rest of the query
  107. 107. VOILA!
  108. 108. The way SHARE “interprets” data varies depending on the context of the query (i.e. which ontologies it reads – Mine? Yours?) and on what part of the query it is trying to answer at any given moment (which ontological concept is relevant to that clause)
  109. 109. Example? Blood Creatinine measurements were not dictated to be Blood Creatinine measurements
  110. 110. Example? The data had the ‘qualities/properties’ that allowed one machine to interpret that they were Blood Creatinine measurements (e.g. to determine which patients were rejecting)
  111. 111. Example? But the data also had the ‘qualities/properties’ that allowed another machine to interpret them as Simple X/Y coordinate data (e.g. the Linear Regression calculation tool)
  112. 112. Benefit of Deep Semantics Data is amenable to constant re-interpretation
  113. 113. http://www.flickr.com/people/faernworks/
  114. 114. Story #2: Measurement Units One example of the “little ways” that Semantics will help researchers day-by-day
  115. 115. Units must be harmonized Don’t leave this up to the researcher (it’s fiddly, time-consuming, and error-prone)
  116. 116. NASA Mars Climate Orbiter
  117. 117. Oops!
  118. 118. The Reality of Clinical Datasets (this is a small snapshot of a dataset we worked on, courtesy of Dr. Bruce McManus & Janet McManus, from the PROOF COE) ID HEIGHT WEIGHT SBP CHOL HDL BMI GR SBP GR CHOL GR HDL GR pt1 1.82 177 128 227 55 0 0 1 0 pt2 179 196 13.4 5.9 1.7 1 0 1 0 Height in m and cm Chol in mmol/l and mg/l ...and other delicious weirdness  The clinical analyses described here were supported in part by the PROOF Center of Excellence for the Prevention of Organ Failure
  119. 119. GOAL: reduce the likelihood of errors by getting the clinical researcher “out of the loop” (as per the Institute of Medicine Recommendations)
  120. 120. Experiment: Reproduce a clinical study (from >10 years ago) by logically encoding the clinical diagnosis guidelines of the American Heart Association then ask SHARE to automatically analyse the patient clinical data
  121. 121. Semantically defining globally-accepted clinical phenotypes; Building on the expertise of others SystolicBloodPressure = GALEN:SystolicBloodPressure and GALEN is a popular biomedical ontology but it is largely, like GO, a series of named but undefined Classes ("sio:has measurement value" some "sio:measurement" and ("sio:has unit" some “om: unit of measure”) and (“om:dimension” value “om:pressure or stress dimension”) and "sio:has value" some rdfs:Literal))
  122. 122. Semantically defining globally-accepted clinical phenotypes; Building on the expertise of others SystolicBloodPressure = relationships like “has measurement valule” GALEN:SystolicBloodPressure and So we use OWL to extend the GALEN Classes with rich, logical descriptors that take advantage of rich semantic and “dimension” and “has unit” ("sio:has measurement value" some "sio:measurement" and ("sio:has unit" some “om: unit of measure”) and (“om:dimension” value “om:pressure or stress dimension”) and "sio:has value" some rdfs:Literal))
  123. 123. Semantically defining globally-accepted clinical phenotypes; Building on the expertise of others SystolicBloodPressure = GALEN:SystolicBloodPressure and ("sio:has measurement value" some "sio:measurement" and ("sio:has unit" some “om: unit of measure”) and (“om:dimension” value “om:pressure or stress dimension”) and "sio:has value" some rdfs:Literal)) Very general definition “some kind of pressure unit” (so that others can build on this as they wish!)
  124. 124. Semantically defining globally-accepted clinical phenotypes; Building on the expertise of others HighRiskSystolicBloodPressure (as defined by Framingham) SystolicBloodPressure and sio:hasMeasurement some (sio:Measurement and (“sio:has unit” value om:kilopascal) and (sio:hasValue some double[>= "18.7"^^double]))) Now we are specific to our clinical study (Framingham definitions): MUST be in kpascal and must be > 18.7
  125. 125. Running the Clinical Analysis “Select the patients who are at-risk” SELECT ?record ?convertedvalue ?convertedunit FROM <./patient.rdf> WHERE { ?record rdf:type measure:HighRiskSystolicBloodPressure . ?record sio:hasMeasurement ?measurement. ?measurement sio:hasValue ?Pressure. } All measurements have now been automatically harmonized to KiloPascal, because we encoded the semantics in the model RecordID Start Val Start Unit Pressure End Unit Pt1 15 cmHg 19.998 KiloPascal Pt2 14.6 cmHg 19.465 KiloPascal Pt1 148 mmHg 19.731 KiloPascal Pt2 146 mmHg 19.465 KiloPascal
  126. 126. While doing this experiment, we noticed some interesting anomalies…
  127. 127. Visual inspection of our output data and the AHA guidelines showed that in many cases the clinician “tweaked” the guidelines when doing their analysis ------------------ AHA BMI risk threshold: BMI=25 In our dataset the clinical researcher used BMI=26 ------------------ AHA HDL guideline HDL<=1.03mmol/l The dataset from our researcher: HDL<=0.89mmol/l -------------------
  128. 128. Visual inspection of our output data and the AHA guidelines showed that in many cases the clinician “tweaked” the guidelines when doing their analysis These Alterations Were Not Recorded in Their Study Notes!
  129. 129. Adjusting our Semantic definitions and re-running the analysis resulted in nearly 100% correspondence with the clinical researcher HighRiskCholesterolRecord= PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.0])))) HighRiskCholesterolRecord= PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.2]))))
  130. 130. Reflect on this for a second... Because this is important! 1. We semantically encoded clinical guidelines 2. We found that clinical researchers did not follow the official guidelines 3. Their “personalization” of the guidelines was unreported 4. Nevertheless, we were able to create “personalized” Semantic Models 5. These models reflect the opinion of an individual domain-expert 6. These models are shared on the Web 7. Can be automatically re-used by others to interpret their own data using that clinical expert’s viewpoint
  131. 131. PREFIX AHA =http://americanheart.org/measurements/ PREFIX McManus=http://stpaulshospital.org/researchers/mcmanus/ AHA:HighRiskCholesterolRecord PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.0])))) McManus:HighRiskCholesterolRecord PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.2]))))
  132. 132. To do the analysis using AHL guidelines SELECT ?patient ?risk WHERE { ?patient rdf:type AHA: HighRiskCholesterolRecord . ?patient ex:hasCholesterolProfile ?risk }
  133. 133. To do the analysis using McManus’ expert-opinion SELECT ?patient ?risk WHERE { ?patient rdf:type McManus:HighRiskCholesterolRecord . ?patient ex:hasCholesterolProfile ?risk }
  134. 134. Flexibility Transparency Reproducibility Shareability Comparability Simplicity Automation
  135. 135. Personalization (I’m going to return to this point several times)
  136. 136. Story #3: in silico Science Reproduce a peer-reviewed scientific publication by semantically modelling the problem
  137. 137. The Publication Discovering Protein Partners of a Human Tumor Suppressor Protein
  138. 138. Original Study Simplified Using what is known about protein interactions in fly & yeast predict new interactions with this Human Tumor Suppressor
  139. 139. Semantic Model of the Experiment OWL
  140. 140. Semantic Model of the Experiment Note that every word in this diagram is, in reality, a URL (it’s a Semantic Web model) i.e. It refers to the expertise of other researchers, distributed around the world on the Web
  141. 141. Set-up the Experimental Conditions In a local data-file provide the protein we are interested in and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  142. 142. SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?protein a i:ProbableInteractor . } Run the Experiment
  143. 143. SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?protein a i:ProbableInteractor . } Run the Experiment This is the URL that leads our computer to the Semantic model of the problem
  144. 144. SHARE examines the semantic model of Probable Interactors Retrieves third-party expertise from the Web Discusses with SADI what analytical tools are necessary Chooses the right tools for the problem Solves the problem!
  145. 145. SHARE derives (and executes) the following analysis automatically
  146. 146. SHARE is aware of the context of the specific question being asked
  147. 147. There are five very cool things about what you just saw...
  148. 148. There are five very cool things about what you just saw... was able to create a workflow based on a semantic model 1.
  149. 149. There are five very cool things about what you just saw... was able to create a COMPUTATIONAL workflow based on a BIOLOGICAL model 2.
  150. 150. There are five very cool things about what you just saw... (this is important because we want who don’t speak computerese!) 2. this system to be used by clinicians and biologists
  151. 151. There are five very cool things about what you just saw... The workflow it created, and services selected, differed depending on the context of the question 3. taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  152. 152. There are five very cool things about what you just saw... The machine was contextually “aware of” The workflow it created, and services chosen, differed depending on the BOTH the biological model context of the question 3. AND the data it was analysing taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly (...remember this... It will be important later!)
  153. 153. There are five very cool things about what you just saw... The ontological model was abstract (and shareable!), but the workflow generated from that model was explicit and concrete 4.
  154. 154. There are five very cool things about what you just saw... The ontological model was abstract (and shareable!), but the workflow generated from that model was explicit and concrete 4.
  155. 155. There are five very cool things about what you just saw... The ontological model was abstract (and shareable!), but the workflow generated from that model was explicit and concrete 4. This matters because…
  156. 156. Remember Trend #1 “the most common errors are simple, the most simple errors are common” At least partially because the analytical methodology was inappropriate and/or not sufficiently described
  157. 157. Remember Trend #1 “the most common errors are simple, the most simple errors are common” At least partially because the analytical methodology was inappropriate and/or not sufficiently described Here, the methodology leading to a result is explicit and automatically constructed from an abstract template so this is (at least in part) a Solved Problem
  158. 158. There are five very cool things about what you just saw... The choice of tool-selection was guided by the knowledge of worldwide domain-experts encoded in globally-distributed ontologies (e.g. Expert high-throughput statisticians, etc...) 5.
  159. 159. There are five very cool things about what you just saw... The choice of tool-selection was guided by the knowledge of worldwide domain-experts encoded in globally-distributed ontologies (e.g. Expert high-throughput statisticians, etc...) And this matters because… 5.
  160. 160. Remember Trend #2 Even small, moderately-funded laboratories can now afford to produce more data than they can manage or interpret These labs will likely never be able to afford a qualified data scientist
  161. 161. Remember Trend #2 Even small, moderately-funded laboratories can now afford to produce more data than they can manage or interpret These labs will likely never be able to afford a qualified data scientist But if the expert knowledge of data scientists is encoded in ontologies, and can be discovered in a contextually-aware manner… then this is a SOLVED PROBLEM
  162. 162. Story #4: Personalized Health Info Can we make the Health information on the Web more “personal”?
  163. 163. Remember when I said... The machine was contextually “aware of” BOTH the biological model AND the data it was analysing
  164. 164. This “dual-awareness” provides some very interesting opportunities for personalizing a patient’s Health Research activity
  165. 165. PROBLEM: Patients are self-educating both about their personal medical situation (e.g. getting themselves sequenced) also surfing the Web, getting dubious advice from sites of dubious authority and joining social-health groups to exchange (often anecdotal) medical “advice” with other patients
  166. 166. PROBLEM: Patients are self-educating The information on any given site may or may not be relevant to THAT patient Information on the Web is, by nature, not personalized
  167. 167. PROBLEM: Clinicians often have patients (especially chronically-ill patients) on a “trajectory” of treatment Medicine is complicated! e.g. the treatment trajectory of the patient can be multi-step, and a specific sign/symptom might be perfectly normal at a particular phase in their “flow” of treatment
  168. 168. PROBLEM SUMMARY Patients are reading non-personalized medical text of dubious quality and relevance Clinicians have no way to intervene in this self-education process explaining to patients how the information they read relates to their personal “health trajectory”
  169. 169. Now you might see why this is so relevant! The machine was contextually “aware of” BOTH the biological model AND the data it was analysing
  170. 170. This is an early prototype of a Patient-driven Personalized Medicine Web interface
  171. 171. Basically, it is a set of SHARE queries Attached to a local database of patient information Running behind a Web bookmarklet
  172. 172. The queries text-mine a Web page then compare the concepts in the page to the patient’s personal data using a SHARE query
  173. 173. The queries text-mine a Web page then compare the concepts in the page to the patient’s personal data using a SHARE query (that could contain ontologies... ...ontologies designed by their clinician!!)
  174. 174. Matching based on official name, compound name, brand name, trade name, or “common name” 
  175. 175. Still needs some work... ??!?!?
  176. 176. Link out to PubMed Why the alert?
  177. 177. The SADI+SHARE workflow and reasoning was personalized to YOUR medical data
  178. 178. In future iterations, we will enable the workflow to be further customized through “personalized” OWL Classes (e.g. Provided by your Clinician!!)
  179. 179. These OWL Classes might include information about the current trajectory of your treatment for a chronic disease, for example, such that what you read on the Web is placed in the context of your expert Clinical care...
  180. 180. Frankly, I think it’s quite cool that people patients are creating and running “personal health-research” workflows at the touch of a button!
  181. 181. Almost the end… Three brief final points....
  182. 182. Publication Discourse Interpretation Hypothesis Experiment ? ?
  183. 183. The Semantic Model represents a possible solution to a problem
  184. 184. The Semantic Model represents a possible solution to a problem By my definition, that is a hypothesis
  185. 185. The Semantic Model represents a possible solution to a problem That hypothesis is tested by automatically converting it into a workflow;
  186. 186. The Semantic Model represents a possible solution to a problem That hypothesis is tested by automatically converting it into a workflow; the workflow, and the results of the workflow are intimately tied to the hypothesis
  187. 187. The Semantic Model represents a possible solution to a problem i.e. You (or anyone!) can determine exactly which aspect of the hypothesis led to which output data element, why, and how
  188. 188. The Semantic Model represents a possible solution to a problem “Exquisite Provenance” a perfect record not only of what was done, when, and how but also WHY
  189. 189. And this is important because...
  190. 190. “Exquisite Provenance” is required for the output data and knowledge to be published as...
  191. 191. Richly annotated, citable, and queryable snippets of scientific knowledge encoded in Linked Data/OWL i.e. a way to publish data and knowledge on the Semantic Web
  192. 192. Publication Discourse Interpretation Hypothesis Experiment
  193. 193. A “modest” vision for pure in silico Science
  194. 194. Last point… perhaps this is not yet obvious…
  195. 195. SADI services consume Linked Data on the Web
  196. 196. SADI services consume Linked Data on the Web The ontologies provided to SHARE are written in OWL, and are therefore inherently part of the Web
  197. 197. SADI services consume Linked Data on the Web The ontologies provided to SHARE are written in OWL, and are therefore inherently part of the Web SADI services create novel semantic links between existing data-points on the Web, or between existing data and new data
  198. 198. SADI services consume Linked Data on the Web The ontologies provided to SHARE are written in OWL, and are therefore inherently part of the Web SADI services create novel semantic links between existing data-points on the Web, or between existing data and new data The output of the automatically-generated workflow is therefore Linked Data and is therefore inherently part of the Web
  199. 199. SADI services consume Linked Data on the Web The ontologies provided to SHARE are written in OWL, and are therefore inherently part of the Web SADI services create novel semantic links between existing data-points on the Web, or between existing data and new data The output of the automatically-generated workflow is therefore Linked Data and is therefore inherently part of the Web The concluding NanoPublications are a combination of Linked Data and OWL, and are published directly to the Web
  200. 200. The Life Science “Singularity” We Are Here! The Semantic Web is a cradle-to-grave biomedical research platform that can, and will, dramatically improve how biomedical research is done
  201. 201. The important people Luke McCarthy (SADI/SHARE) Benjamin Vandervalk (SHARE) Dr. Soroush Samadian (clinical experiments) Ian Wood (Experiment-replication experiment)
  202. 202. Microsoft Research

×