Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

529 views

Published on

The PHT (https://www.youtube.com/watch?v=mktAtHmy-FM) is an exciting initiative to facilitate the analysis of FAIR patient data, while protecting patient's privacy and security. One aspect of the PHT is that it will be necessary for the "train" to autonomously act on data once it enters a Patient's personal health data locker. Here, we propose the use of SADI and SHARE to autonomously negotiate between the needs of the researcher, and the patient data available. SADI and SHARE were designed in ~2008 specifically to operate on FAIR data, and thus we propose they are purpose-built for this problem.


Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

  1. 1. Selecting Appropriate Data for the Personal Health Train Mark D. Wilkinson (markw@illuminae.com) BBVA-UPM Industry Chair on Biotechnology Isaac Peral/Marie Curie Distinguished Researcher Universidad Politécnica de Madrid
  2. 2. Which “bit” of the train/track am I interested in?
  3. 3. Which “bit” of the train/track am I interested in? In this frame of the PHT video, the train is being “scanned”
  4. 4. Which “bit” of the train/track am I interested in? In this frame of the PHT video, the train is being “scanned” Meta descriptors of “questions” (analyses, data gathering, etc.)
  5. 5. Which “bit” of the train/track am I interested in? In this frame of the PHT video, the train is being “scanned” Meta descriptor of data holdings inside the “locker”
  6. 6. Which “bit” of the train/track am I interested in? Matching of question against data via metadata comparison Putative Match!
  7. 7. Which “bit” of the train/track am I interested in? Accomplished by the FAIR Data Point(s) and indexes of these Putative Match!
  8. 8. Which “bit” of the train/track am I interested in? Accomplished by the FAIR Data Point(s) and indexes of these Putative Match!
  9. 9. Which “bit” of the train/track am I interested in? Importantly, this happens “in the open” (may involve humans!)
  10. 10. Which “bit” of the train/track am I interested in? Also very interesting issues around informed consent…
  11. 11. A match of the question metadata against public “station” metadata tells the train to enter that station to see if there are any relevant data points What happens inside the station, however, is a “Black Box”
  12. 12. Now we are inside the station i.e. a data repository or “locker” All decisions from here onwards must be fully autonomous! No peeking!
  13. 13. How can this be?? Because a metadata match is not the same as a data match! What is actually in the matched Locker will be unpredictable
  14. 14. Analytical algorithms/Q’s may have specific requirements (data type, format) that don’t match the data content in this locker
  15. 15. The desired data may not exist at all (e.g. inclusion/exclusion criteria such as a specific type of clinical measurement, in the context of a specific drug)
  16. 16. Metadata cannot describe everything about the data (otherwise, it would be the data  )
  17. 17. We require: Intelligent, autonomous matching of FAIR Data against analytical tools/workflows both semantically, and syntactically
  18. 18. We require: Automatic data reformatting, where necessary
  19. 19. We require: Automatic detection of “fillable gaps” in the data (and filling those gaps)
  20. 20. We require: Automatic staging of data for analysis
  21. 21. We require: Automatic execution of analysis (“analysis” may be a single algorithm or a workflow)
  22. 22. We require: Automatic collection of results, and all provenance metadata from the analysis
  23. 23. We require: Automatic purging of any identifiable/private data remaining in the output dataset
  24. 24. We require: No human intervention at any point! This is happening in a “black box”
  25. 25. Between 2006-2008 my laboratory at St. Paul’s Hospital, Vancouver created technologies to address exactly this problem in the context of FAIR Data (…but before FAIR was a “thing” ;-) )
  26. 26. Semantic Automated Discovery and Integration A design-pattern for analytical tools that utilize FAIR Data
  27. 27. Semantic Health And Research Environment A multi-faceted “engine” that automatically assembles FAIR Data and uses it to execute appropriate SADI tools to answer research questions
  28. 28. Original Purpose Facilitate interoperability between Globally-distributed Web Services
  29. 29. Re-Purpose Facilitate interoperability between incoming PHT analyses and Locker data
  30. 30. SADI Defines a design pattern for the interface to any analytical tool that consumes FAIR Data Includes support for NanoPublication of the output from analyses (i.e. SADI natively outputs FAIR data also)
  31. 31. SHARE Query interpretation Semantic reasoning over data Analytical tool selection (SADI) Workflow synthesis Data reformatting Data/Service matchmaking Workflow execution [Provenance capture] Output formatting
  32. 32. Height: 187 Weight: 89 Typical Analytical Tool 25.5 BMI Calculator
  33. 33. 187 Analytical Tool With SADI BMI 25.5 Patient_09 height 89 weight 187 Patient_09 height 89 weight Provenance BMI Calculator
  34. 34. 187 Analytical Tool With SADI BMI 25.5 Patient_09 height 89 weight 187 Patient_09 height 89 weight Provenance BMI Calculator
  35. 35. SADI Tools are described by metadata that contain OWL models of their Input and Output data, which must be FAIR
  36. 36. SADI Tools are described by metadata that contain OWL models of their Input and Output data, which must be FAIR Data/Tool matching can be done by: Exact-match or Ontological reasoning
  37. 37. SADI Tools are described by metadata that contain OWL models of their Input and Output data, which must be FAIR Data/Tool matching can be done by:
  38. 38. To understand SHARE it is best to see it in-action
  39. 39. These are 100% real, working examples of SHARE doing the kinds of analyses that we expect the PHT to do…
  40. 40. For each SNP in each patient, where the SNP results in an altered protein product, we want to know the pathways that are affected in that patient SELECT ?gene ?pathway WHERE { uniprot:XXXXXX pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . } Start simply… Exact Match Discovery + Analysis
  41. 41. The patient who owns this locker is recorded as having a SNP variant that affects protein P47989 (UniProt). What pathways are affected by this SNP? SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . } The PHT is now inside an individual locker
  42. 42. Give that query to SHARE
  43. 43. Tools carried in the PHT “car” (or in some circumstances, even external to the PHT) are now matched against the data in the Locker, assembled into an analytical workflow, and the workflow is executed SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  44. 44. First: a tool is discovered that takes UniProt identifiers and maps them to their respective genes Second: the appropriate data is selected from the data source (locker) and that tool is executed. Third: the output from that tool is evaluated to ensure it is correct input to the tool that determines the pathways that a gene participates in Fourth: that tool is executed, and the output is collected and formatted… SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  45. 45. SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  46. 46. That was a simple example The PHT will encounter much more complex cases
  47. 47. Detect if the patient who owns this locker is rejecting their kidney transplant If so, then collect their latest Blood Urea Nitrogen and Creatinine levels SELECT ?patient ?bun ?creat FROM <patient:locker> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  48. 48. Detect if the patient who owns this locker is rejecting their kidney transplant If so, then collect their latest Blood Urea Nitrogen and Creatinine levels SELECT ?patient ?bun ?creat FROM <patient:locker> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  49. 49. Likely Rejecter: A patient who has creatinine levels that are increasing over time - - Mark D Wilkinson’s definition
  50. 50. Likely Rejecter: FAIR does not equal “Predictable”!! The information requested by a researcher is not always going to be recorded in a patient’s Personal Health Locker or even in a hospital clinical database at least, not always in the way they want it…
  51. 51. Likely Rejecter: The PHT is going to have to deal with a wide range of scenarios, including data that has not been annotated in the manner required to answer the question We’re in the Black Box, we can’t ask for human assistance The system must decide autonomously!
  52. 52. Likely Rejecter: In this case, we will assume that the patient’s clinical information contains only a time-series of blood creatinine measurements “worst-case” scenario No guidance whatsoever! Only raw, uninterpreted data. …but there is sufficient info. to solve the problem!
  53. 53. My definition of a Likely Rejecter is encoded in a machine-readable document written in the OWL Ontology language Basically: “the regression line over creatinine measurements should have an increasing slope”
  54. 54. SELECT ?patient ?bun ?creat FROM <patient:locker> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  55. 55. SHARE examines the query and determines that it is looking for “Rejecters”
  56. 56. SHARE examines the query and determines that it is looking for “Rejecters” It checks if the “Rejecter” property is in the patient’s locker, and finds that it is not.
  57. 57. SHARE examines the query and determines that it is looking for “Rejecters” It checks if the “Rejecter” property is in the patient’s locker, and finds that it is not. It examines the definition of “Rejecter” and matches each property (slope, intercept, etc.) with a SADI Tool. These are pipelined into a workflow
  58. 58. SHARE examines the query and determines that it is looking for “Rejecters” It checks if the “Rejecter” property is in the patient’s locker, and finds that it is not. It examines the definition of “Rejecter” and matches each property (slope, intercept, etc.) with a SADI Tool. These are pipelined into a workflow Finally, it determines what data is available, and where that data can be piped into the workflow (semantic matching)
  59. 59. SHARE decides that it needs to do a Linear Regression analysis on the blood creatinine measurements It finds a linear regression tool (SADI) repackages the data and executes the analysis
  60. 60. A screenshot of SHARE solving the Likely Rejecter query
  61. 61. How SHARE interprets the data varies throughout the execution of the analysis
  62. 62. Example? Blood Creatinine measurements were not dictated to only be Blood Creatinine measurements
  63. 63. Example? FAIR Data has the ‘qualities/properties’ that allows one analytical tool to interpret that they are Blood Creatinine measurements (e.g. to determine which patients are rejecting)
  64. 64. Example? But the data also has the ‘qualities/properties’ that allows another analytical tool to interpret them as Simple X/Y coordinate data (e.g. the Linear Regression calculation tool)
  65. 65. Because of the “I” in FAIR FAIR Data is amenable to autonomous Interpretation Reinterpretation Reformatting and (Re-)Integration
  66. 66. Because SADI Tools are defined in terms of the FAIR Data they operate-on And because the PHT will carry a limited number of such tools (selected by the researcher for their specific task) We can rely on the PHT’s SHARE to undertake rapid, efficient, autonomous matchmaking between the patient data, and the appropriate tools/workflows inside the black box of the Patient locker
  67. 67. And this gives us…
  68. 68. http://www.flickr.com/people/faernworks/
  69. 69. One more example Here, we address a problem that we know the PHT is going to encounter
  70. 70. ID HEIGHT WEIGHT SBP CHOL HDL BMI GR SBP GR CHOL GR HDL GR pt1 1.82 177 128 227 55 0 0 1 0 pt2 179 196 13.4 5.9 1.7 1 0 1 0 A legacy clinical dataset (from the 1970’s) used in our SHARE R&D studies Height in m and cm Chol in mmol/l and mg/l ...and other delicious weirdness 
  71. 71. GOAL: autonomous detection and resolution of conflicts in the recorded measurement units between disparate clinical datasets
  72. 72. Rich data structures like this one can be “Projected” from existing FAIR Data sources like the PH Locker These become input to…
  73. 73. Unified SADI Tool for automated Unit conversion of any type • Send it a dataset with mixed units • (optional) tell it the harmonized unit you want back • Returns you a dataset with harmonized units Automatic semantic detection of the “nature” of the incoming unit type (e.g. “unit of pressure”) Automatic conversion based on dimensionality and/or offset & multiplier
  74. 74. The researcher asking the question will define the clinical measurements of interest to them including measurement units and inclusion/exclusion criteria measure:HighRiskSystolicBloodPressure measure:SystolicBloodPressure and sio:hasMeasurement some (sio:Measurement and (“sio:has unit” value om:kilopascal) and (sio:hasValue some double[>= "18.7"^^double]))) Now we’re being specific MUST be in kpascal and must be > 18.7
  75. 75. SELECT ?record ?convertedvalue ?convertedunit FROM <patient:locker> WHERE { ?record rdf:type measure:HighSystolicBloodPressure . ?record sio:hasMeasurement ?measurement. ?measurement sio:hasValue ?convertedvalue. ?record cardio:ExpertClassification ?riskgrade . } RecordID Start Val Start Unit End Val End Unit cm_hg1 15 cmHg 19.998 KiloPascal cm_hg2 14.6 cmHg 19.465 KiloPascal mm_hg1 14.8 mmHg 19.731 KiloPascal mm_hg2 146 mmHg 19.465 KiloPascal SHARE query Because HighSystolicBloodPressure was defined in kpascal, SHARE automatically told SADI to convert everything into kpascal
  76. 76. Different things can/will happen inside of different lockers, even in the context of the same question But these are black boxes! SADI services natively output NanoPublications, therefore we have a detailed record of provenance associated with EACH AND EVERY data point. We can peek inside the black box! Final Note #1 Reproducibility & Scholarly Rigor
  77. 77. How do we get SHARE, the relevant SADI services and the workflows into the locker? Final Note #2 Deployment
  78. 78. How do we get SHARE, the relevant SADI services and the workflows into the locker? Final Note #2 Deployment
  79. 79. We are not alone…
  80. 80. We are not alone…
  81. 81. Accurate, autonomous matchmaking between data and tools/workflows is tricky …even if the data is FAIR!
  82. 82. SADI and SHARE were designed specifically to solve this problem!
  83. 83. Specific Acknowledgements to: Dr. Mikel Egaña Aranguren (SADI + Galaxy + Docker) Dr. Soroush Samadian (clinical measurement unit conversion) Luke McCarthy and Ben Vandervalk (SADI + SHARE)
  84. 84. Microsoft Research

×