Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tackling the Clinical Data Challenges
When Analyzing a Million Genomes
April 18, 2019
Kees van Bochove – CEO The Hyve
@kee...
2
Overview
● Introductions
● Clinical Data Challenges?
● Health Data Networks
● Clinical Data Models
● Personal Health Tra...
Core values
● Share
● Reuse
● Specialize
Office Locations
● Utrecht, Netherlands
● Cambridge, US
Fast-growing
● Started in...
4
Teams
Research Data Management
● FAIR / Data Governance consultancy
● Fairspace (meta)data management
Cancer Genomics
● ...
INTRODUCTION
5
6
Relevance of FAIR Data in Pharma
“The first thing we’ve learned is the importance
of having outstanding data to actually...
7
FAIR Workshop at The Hyve in Utrecht, 2018
http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop
h...
http://www.nature.com/articles/sdata201618
Accessible:
A1. standardized protocol
A1.1 open, free and universally implement...
HEALTH DATA NETWORKS
9
10
Researcher
Healthcare
Professional
Patient /
Citizen
Health
Data
11
Network architectures
12
Health Data Research Infrastructures
13
14
From : Architecture of a Biomedical Informatics Research Data Management Pipeline, Bauer, .. Sax et al. Stud Health Tec...
15
Bringing people & communities together
http://blog.thehyve.nl/blog/pan-european-health-data-networks-meeting
16
Deep-dive into Health Data Networks
https://youtu.be/
C95pl11zdAs
About the policy
background of health
data networks, ...
CLINICAL DATA MODELS
17
18
A small detour to our beginnings
▶ Objective Reality
▶ Subjective Reality
▶ Intersubjective Reality
“Ever since the Cog...
19
Data Models 101
▶ Problem space models
▶ Semantics of the model are restricted to those that characterize the “problem ...
20
BRIDG Domain Model
21
Data Models 101
▶ Solution space models
▶ CDISC SDTM provides a standard for organizing and formatting data to streamli...
22
CDISC Standards in Clinical Research
23
CDISC
▶ (Underlying) standards
evolve over time
▶ SDTM is bound by its
regulatory submission
context
▶ Not meant / suit...
24
Common Data Models Comparison
OMOP
▶ Scope: Observational Data
▶ Standardized Vocabularies
▶ Person Centric Model
▶ Pre...
25
OMOP
CDM v5
▶ Observational
data
▶ Fields defined
per domain
▶ Standardized
Vocabularies
26
OMOP Standardized Vocabularies
27
i2b2/tranSMART Data Model
▶ One observation
domain
▶ Study-specific tree of
concepts
▶ Supports:
▶ absolute and relativ...
28
PhUSE
● Representing
clinical trials as
RDF
● Graph
representation
more natural
than tabulation
29
FHIR
● Fast Healthcare Interoperability
Resources, “HL7 REST API”
● Exchange of healthcare data
elements such as Patien...
30
Common Data Model Definition
▶ CDM is “a mechanism by which raw data are
standardised to a common structure, format and...
USE CASES
31
32
MI: Architecture for health research IT
● Information Model: FHIR Profiles
● Data transport & persistence: FHIR for cli...
33
EHDEN: using the OMOP CDM
Data
Catalogue
Mapping
Tools
Federated
Network
- White pages of EHRs,
registries, etc.
- Stor...
34
Genomics
England
Research Environment
NHS Trusts
Airlock
Research Community
35
Deep-dive into Open Science with OHDSI
https://youtu.be/
X5yuoJoL6xs
About a 5 day ‘study-a-
thon’, in which about 40
s...
PERSONAL HEALTH TRAIN
36
Personal Health Train
37
https://www.dtls.nl/fair-data/personal-health-train/
38
Deep-dive into Personal Health Train
https://youtu.be/
jcxZiqkqMgc
About the technical
architecture, the Personal
Healt...
Clinical Data Models - The Hyve - Bio IT World April 2019
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1

Share

Download to read offline

Clinical Data Models - The Hyve - Bio IT World April 2019

Download to read offline

Population genetics and genomics is an emerging topic for the application of machine learning methods in healthcare and biomedical sciences. Currently, several large genomics initiatives, such as Genomics England, UK Biobank, the All of Us Project, and Europe's 1 Million Genomes Initiative are all in the process of making both clinical and genomics data available from large numbers of patients to benefit biomedical research. However, a key challenge in these initiatives is the standardization of the clinical and outcomes data in such a way that machine learning methods can be effectively trained to discover useful medical and scientific insights. In this talk, we will look at what data is available at scale, and review some of examples of the application of common data and evidence models such as OMOP, FHIR, GA4GH etc. in order to achieve this, based on projects which The Hyve has executed with some of these initiatives to harmonize their clinical, genomics, imaging and wearables data and make it FAIR.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Clinical Data Models - The Hyve - Bio IT World April 2019

  1. 1. Tackling the Clinical Data Challenges When Analyzing a Million Genomes April 18, 2019 Kees van Bochove – CEO The Hyve @keesvanbochove
  2. 2. 2 Overview ● Introductions ● Clinical Data Challenges? ● Health Data Networks ● Clinical Data Models ● Personal Health Train ● Q&A
  3. 3. Core values ● Share ● Reuse ● Specialize Office Locations ● Utrecht, Netherlands ● Cambridge, US Fast-growing ● Started in 2012 ● Now 40+ people We advance biology and medical sciences by building and serving thriving open source communities 3 Customers ● Pharma & Life Sciences ● Healthcare ● Government & non-profit
  4. 4. 4 Teams Research Data Management ● FAIR / Data Governance consultancy ● Fairspace (meta)data management Cancer Genomics ● Cancer data warehouse: cBioPortal ● Knowledge base: Open Targets Data Warehousing ● Data warehouses: tranSMART, i2b2 ● Cohort selection: Glowing Bear ● Request Portals: Podium Real World Data ● Real world evidence: OMOP/OHDSI ● Wearables platform: RADAR-BASE ● Data catalogues: CKAN, DataVerse
  5. 5. INTRODUCTION 5
  6. 6. 6 Relevance of FAIR Data in Pharma “The first thing we’ve learned is the importance of having outstanding data to actually base your ML on. (...) I think people underestimate how little clean data there is out there, and how hard it is to clean and link the data.” - Vas Marasimhan, CEO Novartis https://www.forbes.com/sites/davidshaywitz/2019/01/16/novartis-ceo-who-wanted-to-bring-tech-into-pharma- now-explains-why-its-so-hard
  7. 7. 7 FAIR Workshop at The Hyve in Utrecht, 2018 http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop https://www.sciencedirect.com/science/article/pii/S1359644618303039
  8. 8. http://www.nature.com/articles/sdata201618 Accessible: A1. standardized protocol A1.1 open, free and universally implementable A1.2. authentication and authorization A2. metadata stay accessible Reusable: R1. attributes R1.1. license R1.2. provenance R1.3. community standards Interoperable: I1. language for knowledge representation I2. vocabularies that follow FAIR principles I3. qualified references to other (meta)data Findable: F1. persistent identifier F2. metadata F3. metadata - data link F4. registered or indexed 8 OMOP, FHIR, i2b2, CDISC etc. RDF DCAT VoID FAIRmetrics PROV-O CC
  9. 9. HEALTH DATA NETWORKS 9
  10. 10. 10 Researcher Healthcare Professional Patient / Citizen Health Data
  11. 11. 11 Network architectures
  12. 12. 12 Health Data Research Infrastructures
  13. 13. 13
  14. 14. 14 From : Architecture of a Biomedical Informatics Research Data Management Pipeline, Bauer, .. Sax et al. Stud Health Technol Inform. 2016;228:262-6. FHIR IHE CDA HL7 I2b2 OMOP SMART CWL/WDL RADAR OCI GA4GH openEHRDICOM SNOMED ICD LOINC CDISC DCAT Two perspectives: Healthcare: HL7 FHIR, RIM, SMART on FHIR, DCM’s, OpenEHR etc. Research & Trials: i2b2/tranSMAR T, OMOP, HPO, ICD, SNOMED- CT, LOINC, ….
  15. 15. 15 Bringing people & communities together http://blog.thehyve.nl/blog/pan-european-health-data-networks-meeting
  16. 16. 16 Deep-dive into Health Data Networks https://youtu.be/ C95pl11zdAs About the policy background of health data networks, patient consent, GDPR, wearables & a lot more!
  17. 17. CLINICAL DATA MODELS 17
  18. 18. 18 A small detour to our beginnings ▶ Objective Reality ▶ Subjective Reality ▶ Intersubjective Reality “Ever since the Cognitive Revolution, Sapiens have been living in a dual reality. On the one hand, the objective reality of rivers, trees and lions; and on the other hand, the imagined reality of gods, nations and corporations. As time went by, the imagined reality became ever more powerful, so that today the very survival of rivers, trees and lions depends on the grace of imagined entities such as the United States and Google.”
  19. 19. 19 Data Models 101 ▶ Problem space models ▶ Semantics of the model are restricted to those that characterize the “problem domain” as described by domain experts ▶ Domain Information Models (e.g. BRIDG): Basic, pre-clinical, clinical, and translational research and associated regulatory artifacts, i.e. the data, organization, resources, rules, and processes involved in the formal assessment of the utility, impact, or other pharmacological, physiological, or psychological effects of a drug, procedure, process, subject characteristic, biologic, cosmetic, food or device on a human, animal, or other subject or substance plus all associated regulatory artifacts required for or derived from this effort, including data specifically associated with postmarket surveillance and adverse event reporting.
  20. 20. 20 BRIDG Domain Model
  21. 21. 21 Data Models 101 ▶ Solution space models ▶ CDISC SDTM provides a standard for organizing and formatting data to streamline processes in collection, management, analysis and reporting ▶ i2b2: model patient-centric clinical and biological data for the purpose of translational research ‘from bench to bedside’ ▶ OMOP: model data from healthcare databases for the purpose of observational research including studying the effects of medical products ▶ Models should have a clearly bounded domain of interest
  22. 22. 22 CDISC Standards in Clinical Research
  23. 23. 23 CDISC ▶ (Underlying) standards evolve over time ▶ SDTM is bound by its regulatory submission context ▶ Not meant / suited for analysis (cf. AdaM) From Tim Williams (UCB), PhUSE 2017 paper
  24. 24. 24 Common Data Models Comparison OMOP ▶ Scope: Observational Data ▶ Standardized Vocabularies ▶ Person Centric Model ▶ Pre-defined domains: Condition, Drug, Procedure, Measurement, Observation... Increased standardization Increased flexibility I2b2/tranSMART ▶ Scope: Translational Data ▶ Flexible Concept Trees ▶ Observation Centric Model ▶ Pre-defined dimensions: Patient, Study, Visit, Concept, Modifier etc. RDF ▶ Scope: Not Limited ▶ ‘Knowledge’ Graph ▶ Flexible Model ▶ Building on Linked Open Data standards
  25. 25. 25 OMOP CDM v5 ▶ Observational data ▶ Fields defined per domain ▶ Standardized Vocabularies
  26. 26. 26 OMOP Standardized Vocabularies
  27. 27. 27 i2b2/tranSMART Data Model ▶ One observation domain ▶ Study-specific tree of concepts ▶ Supports: ▶ absolute and relative time series ▶ samples and replicates ▶ cross-study concepts and ontologies
  28. 28. 28 PhUSE ● Representing clinical trials as RDF ● Graph representation more natural than tabulation
  29. 29. 29 FHIR ● Fast Healthcare Interoperability Resources, “HL7 REST API” ● Exchange of healthcare data elements such as Patient, Practitioner, Procedure, Medication ● FHIR Profiles describe usage
  30. 30. 30 Common Data Model Definition ▶ CDM is “a mechanism by which raw data are standardised to a common structure, format and terminology independently from any particular study in order to allow a combined analysis across several databases/datasets.” ▶ Standardisation of structure and content allows the use of standardised applications, tools and methods across the data to answer a wide range of questions.
  31. 31. USE CASES 31
  32. 32. 32 MI: Architecture for health research IT ● Information Model: FHIR Profiles ● Data transport & persistence: FHIR for clinical data, GA4GH / genomics standards for genomics data ● Dedicated & reproducible data warehouses for analysis FHIR + GA4GH Data Warehouses & Marts Collaboration with O. Kohlbacher, University of Tubingen
  33. 33. 33 EHDEN: using the OMOP CDM Data Catalogue Mapping Tools Federated Network - White pages of EHRs, registries, etc. - Store metadata about data provenance, governance, population characteristics etc. - Increase FAIRness and citability (e.g. DOI) - Training materials - Data (mapping) quality assessment & ETL validation tooling - European CDM vocabulary extensions - Federated study execution - Security and access control - Dashboards for study results - Remote research environment - EHDEN Portal - Interoperability & FAIR
  34. 34. 34 Genomics England Research Environment NHS Trusts Airlock Research Community
  35. 35. 35 Deep-dive into Open Science with OHDSI https://youtu.be/ X5yuoJoL6xs About a 5 day ‘study-a- thon’, in which about 40 scientists tried to predict the outcome of a long running RCT with observational data.
  36. 36. PERSONAL HEALTH TRAIN 36
  37. 37. Personal Health Train 37 https://www.dtls.nl/fair-data/personal-health-train/
  38. 38. 38 Deep-dive into Personal Health Train https://youtu.be/ jcxZiqkqMgc About the technical architecture, the Personal Health Train, creating the stations, securing the data and federated workflows!
  • RebeccaBakerMHA

    Aug. 22, 2020

Population genetics and genomics is an emerging topic for the application of machine learning methods in healthcare and biomedical sciences. Currently, several large genomics initiatives, such as Genomics England, UK Biobank, the All of Us Project, and Europe's 1 Million Genomes Initiative are all in the process of making both clinical and genomics data available from large numbers of patients to benefit biomedical research. However, a key challenge in these initiatives is the standardization of the clinical and outcomes data in such a way that machine learning methods can be effectively trained to discover useful medical and scientific insights. In this talk, we will look at what data is available at scale, and review some of examples of the application of common data and evidence models such as OMOP, FHIR, GA4GH etc. in order to achieve this, based on projects which The Hyve has executed with some of these initiatives to harmonize their clinical, genomics, imaging and wearables data and make it FAIR.

Views

Total views

932

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

32

Shares

0

Comments

0

Likes

1

×