Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Clinical Data Models - The Hyve - Bio IT World April 2019


Published on

Population genetics and genomics is an emerging topic for the application of machine learning methods in healthcare and biomedical sciences. Currently, several large genomics initiatives, such as Genomics England, UK Biobank, the All of Us Project, and Europe's 1 Million Genomes Initiative are all in the process of making both clinical and genomics data available from large numbers of patients to benefit biomedical research. However, a key challenge in these initiatives is the standardization of the clinical and outcomes data in such a way that machine learning methods can be effectively trained to discover useful medical and scientific insights. In this talk, we will look at what data is available at scale, and review some of examples of the application of common data and evidence models such as OMOP, FHIR, GA4GH etc. in order to achieve this, based on projects which The Hyve has executed with some of these initiatives to harmonize their clinical, genomics, imaging and wearables data and make it FAIR.

Published in: Healthcare
  • Positions Available Now! We currently have several openings for writing workers. ➤➤
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Clinical Data Models - The Hyve - Bio IT World April 2019

  1. 1. Tackling the Clinical Data Challenges When Analyzing a Million Genomes April 18, 2019 Kees van Bochove – CEO The Hyve @keesvanbochove
  2. 2. 2 Overview ● Introductions ● Clinical Data Challenges? ● Health Data Networks ● Clinical Data Models ● Personal Health Train ● Q&A
  3. 3. Core values ● Share ● Reuse ● Specialize Office Locations ● Utrecht, Netherlands ● Cambridge, US Fast-growing ● Started in 2012 ● Now 40+ people We advance biology and medical sciences by building and serving thriving open source communities 3 Customers ● Pharma & Life Sciences ● Healthcare ● Government & non-profit
  4. 4. 4 Teams Research Data Management ● FAIR / Data Governance consultancy ● Fairspace (meta)data management Cancer Genomics ● Cancer data warehouse: cBioPortal ● Knowledge base: Open Targets Data Warehousing ● Data warehouses: tranSMART, i2b2 ● Cohort selection: Glowing Bear ● Request Portals: Podium Real World Data ● Real world evidence: OMOP/OHDSI ● Wearables platform: RADAR-BASE ● Data catalogues: CKAN, DataVerse
  6. 6. 6 Relevance of FAIR Data in Pharma “The first thing we’ve learned is the importance of having outstanding data to actually base your ML on. (...) I think people underestimate how little clean data there is out there, and how hard it is to clean and link the data.” - Vas Marasimhan, CEO Novartis now-explains-why-its-so-hard
  7. 7. 7 FAIR Workshop at The Hyve in Utrecht, 2018
  8. 8. Accessible: A1. standardized protocol A1.1 open, free and universally implementable A1.2. authentication and authorization A2. metadata stay accessible Reusable: R1. attributes R1.1. license R1.2. provenance R1.3. community standards Interoperable: I1. language for knowledge representation I2. vocabularies that follow FAIR principles I3. qualified references to other (meta)data Findable: F1. persistent identifier F2. metadata F3. metadata - data link F4. registered or indexed 8 OMOP, FHIR, i2b2, CDISC etc. RDF DCAT VoID FAIRmetrics PROV-O CC
  10. 10. 10 Researcher Healthcare Professional Patient / Citizen Health Data
  11. 11. 11 Network architectures
  12. 12. 12 Health Data Research Infrastructures
  13. 13. 13
  14. 14. 14 From : Architecture of a Biomedical Informatics Research Data Management Pipeline, Bauer, .. Sax et al. Stud Health Technol Inform. 2016;228:262-6. FHIR IHE CDA HL7 I2b2 OMOP SMART CWL/WDL RADAR OCI GA4GH openEHRDICOM SNOMED ICD LOINC CDISC DCAT Two perspectives: Healthcare: HL7 FHIR, RIM, SMART on FHIR, DCM’s, OpenEHR etc. Research & Trials: i2b2/tranSMAR T, OMOP, HPO, ICD, SNOMED- CT, LOINC, ….
  15. 15. 15 Bringing people & communities together
  16. 16. 16 Deep-dive into Health Data Networks C95pl11zdAs About the policy background of health data networks, patient consent, GDPR, wearables & a lot more!
  18. 18. 18 A small detour to our beginnings ▶ Objective Reality ▶ Subjective Reality ▶ Intersubjective Reality “Ever since the Cognitive Revolution, Sapiens have been living in a dual reality. On the one hand, the objective reality of rivers, trees and lions; and on the other hand, the imagined reality of gods, nations and corporations. As time went by, the imagined reality became ever more powerful, so that today the very survival of rivers, trees and lions depends on the grace of imagined entities such as the United States and Google.”
  19. 19. 19 Data Models 101 ▶ Problem space models ▶ Semantics of the model are restricted to those that characterize the “problem domain” as described by domain experts ▶ Domain Information Models (e.g. BRIDG): Basic, pre-clinical, clinical, and translational research and associated regulatory artifacts, i.e. the data, organization, resources, rules, and processes involved in the formal assessment of the utility, impact, or other pharmacological, physiological, or psychological effects of a drug, procedure, process, subject characteristic, biologic, cosmetic, food or device on a human, animal, or other subject or substance plus all associated regulatory artifacts required for or derived from this effort, including data specifically associated with postmarket surveillance and adverse event reporting.
  20. 20. 20 BRIDG Domain Model
  21. 21. 21 Data Models 101 ▶ Solution space models ▶ CDISC SDTM provides a standard for organizing and formatting data to streamline processes in collection, management, analysis and reporting ▶ i2b2: model patient-centric clinical and biological data for the purpose of translational research ‘from bench to bedside’ ▶ OMOP: model data from healthcare databases for the purpose of observational research including studying the effects of medical products ▶ Models should have a clearly bounded domain of interest
  22. 22. 22 CDISC Standards in Clinical Research
  23. 23. 23 CDISC ▶ (Underlying) standards evolve over time ▶ SDTM is bound by its regulatory submission context ▶ Not meant / suited for analysis (cf. AdaM) From Tim Williams (UCB), PhUSE 2017 paper
  24. 24. 24 Common Data Models Comparison OMOP ▶ Scope: Observational Data ▶ Standardized Vocabularies ▶ Person Centric Model ▶ Pre-defined domains: Condition, Drug, Procedure, Measurement, Observation... Increased standardization Increased flexibility I2b2/tranSMART ▶ Scope: Translational Data ▶ Flexible Concept Trees ▶ Observation Centric Model ▶ Pre-defined dimensions: Patient, Study, Visit, Concept, Modifier etc. RDF ▶ Scope: Not Limited ▶ ‘Knowledge’ Graph ▶ Flexible Model ▶ Building on Linked Open Data standards
  25. 25. 25 OMOP CDM v5 ▶ Observational data ▶ Fields defined per domain ▶ Standardized Vocabularies
  26. 26. 26 OMOP Standardized Vocabularies
  27. 27. 27 i2b2/tranSMART Data Model ▶ One observation domain ▶ Study-specific tree of concepts ▶ Supports: ▶ absolute and relative time series ▶ samples and replicates ▶ cross-study concepts and ontologies
  28. 28. 28 PhUSE ● Representing clinical trials as RDF ● Graph representation more natural than tabulation
  29. 29. 29 FHIR ● Fast Healthcare Interoperability Resources, “HL7 REST API” ● Exchange of healthcare data elements such as Patient, Practitioner, Procedure, Medication ● FHIR Profiles describe usage
  30. 30. 30 Common Data Model Definition ▶ CDM is “a mechanism by which raw data are standardised to a common structure, format and terminology independently from any particular study in order to allow a combined analysis across several databases/datasets.” ▶ Standardisation of structure and content allows the use of standardised applications, tools and methods across the data to answer a wide range of questions.
  31. 31. USE CASES 31
  32. 32. 32 MI: Architecture for health research IT ● Information Model: FHIR Profiles ● Data transport & persistence: FHIR for clinical data, GA4GH / genomics standards for genomics data ● Dedicated & reproducible data warehouses for analysis FHIR + GA4GH Data Warehouses & Marts Collaboration with O. Kohlbacher, University of Tubingen
  33. 33. 33 EHDEN: using the OMOP CDM Data Catalogue Mapping Tools Federated Network - White pages of EHRs, registries, etc. - Store metadata about data provenance, governance, population characteristics etc. - Increase FAIRness and citability (e.g. DOI) - Training materials - Data (mapping) quality assessment & ETL validation tooling - European CDM vocabulary extensions - Federated study execution - Security and access control - Dashboards for study results - Remote research environment - EHDEN Portal - Interoperability & FAIR
  34. 34. 34 Genomics England Research Environment NHS Trusts Airlock Research Community
  35. 35. 35 Deep-dive into Open Science with OHDSI X5yuoJoL6xs About a 5 day ‘study-a- thon’, in which about 40 scientists tried to predict the outcome of a long running RCT with observational data.
  37. 37. Personal Health Train 37
  38. 38. 38 Deep-dive into Personal Health Train jcxZiqkqMgc About the technical architecture, the Personal Health Train, creating the stations, securing the data and federated workflows!