Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015

•

3 likes•12,487 views

Presentation on integrating structured and unstructured data using the OMOP CDM on the OHDSI platform (www.ohdsi.org) and Regenstrief's NLP tools.

Healthcare

Linking Structured and Unstructured
Clinical Phenotypes through the
OMOP Common Data Model
Jon Duke, Charity Hilton, Chris
Beesley, Jonathan Cummins

Conflicts of Interest
• None of the authors report any relevant
conflicts of interest

Unstructured Data
ICD-9
CPTs Meds
Labs
Structured
Data
EndoscopyPathology
Op Reports
Radiology
Echo’s
Natural
Language
Processing
Traditional
Analytics
Provider Notes

Integrated Phenotyping
Combining structured and unstructured data is
optimal for many cohort definitions
Severe CHF: BNP level + NYHA Class

Combining Structured and
Unstructured Data is Challenging
• If unstructured data are available, often reside
in a different data environment with different
retrieval mechansims
– e.g., HDFS / Lucene vs RDBMS
• Unstructured text  structured concept
mapping is valuable but may result in data loss
(e.g. no UMLS concept for ‘NYHA 3’)

Our Goal
• Seamless integrated cohort generation using
unstructured and structured data sources
• Allow efficient exploration of data in either
environment on cohorts derived from either
source

Structured Data:
OMOP CDM v5
www.ohdsi.org

Unstructured Data:
HDFS / Solr / UIMA Environment

The Magic Sauce
CDM Cohort Table
Solr
Patient List Core
Synchronized Synthetic IDs
and Date Perturbations
Person_id
.
.
.
.
Person_id, cohort_start, cohort_end
.
.
.
.

Cohort Shows up in CDM
www.github.com/OHDSI/Heracles

Phase One
• We’ve defined a CDM cohort using text-based
patient identification
• We used a simple search algorithm, but could
extend to NLP pipelines as well (e.g., based on
negation, context, FHx, value extraction)

Create in Cohort CDM
Send to NLP
www.ohdsi.org/web/circe

Pulls Subject IDs from Cohort
Adds to NLP Patient List

Phase Two
• We use the CDM Cohort creation tool to build
a cohort based on structured data
• Any CDM cohort can be consumed and
converted into an NLP patient list
• We have thus enabled a JOIN between the
structured and unstructured criteria

Conclusion
• Use consistent synthetic identifiers and date-
shifting between CDM and Lucene indices
• Take advantage of APIs from Solr and OHDSI
for cohort insertion
• Provide UI hooks for a consistent and
integrated user experience
• Enable your environment to conduct rich
integrated phenotyping and data exploration

What's hot

$Accessing information for chemicals in hydraulic fracturing fluids using the ...$ $Accessing information for chemicals in hydraulic fracturing fluids using the ...$

Accessing information for chemicals in hydraulic fracturing fluids using the ...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Overview of open resources to support automated structure verification and e...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Enhancing the Quality of ImmPort DataBarry Smith

Translating research into practical tools: A case study of GenRA, a new read...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Sharing chemical structures with peer reviewed publications US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Structure identification by Mass Spectrometry Non-Targeted Analysis using the...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Opportunities in chemical structure standardizationValery Tkachenko

Non-targeted analysis supported by data and cheminformatics delivered via the...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Webtools For Reference Searchwiser pku

Chemical identification of unknowns in high resolution mass spectrometry usin...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko

Adding complex expert knowledge into chemical database and transforming surfa...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ACS 248th Paper 71 ChAMP ProjectStuart Chalk

Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemistry Validation and Standardization Platform v2.0Valery Tkachenko

Structure identification approaches using the EPA CompTox Chemicals Dashboard...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins

What's hot (20)

$Accessing information for chemicals in hydraulic fracturing fluids using the ...$ $Accessing information for chemicals in hydraulic fracturing fluids using the ...$

Accessing information for chemicals in hydraulic fracturing fluids using the ...

Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...

Overview of open resources to support automated structure verification and e...

Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...

Enhancing the Quality of ImmPort Data

Translating research into practical tools: A case study of GenRA, a new read...

Sharing chemical structures with peer reviewed publications

Structure identification by Mass Spectrometry Non-Targeted Analysis using the...

US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...

Opportunities in chemical structure standardization

Non-targeted analysis supported by data and cheminformatics delivered via the...

Webtools For Reference Search

Chemical identification of unknowns in high resolution mass spectrometry usin...

Tools and approaches for data deposition into nanomaterial databases

Adding complex expert knowledge into chemical database and transforming surfa...

ACS 248th Paper 71 ChAMP Project

Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...

Chemistry Validation and Standardization Platform v2.0

Structure identification approaches using the EPA CompTox Chemicals Dashboard...

Acs collaborative computational technologies for biomedical research an enabl...

Recently uploaded (20)

VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012

(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...

Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝

❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR Call G...

Call Girl Price Amritsar ❤️🍑 9053900678 Call Girls in Amritsar Suman

Call Girls in Lucknow Esha 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow

💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋

Russian Call Girls Gurgaon Swara 9711199012 Independent Escort Service Gurgaon

Vip sexy Call Girls Service In Sector 137,9999965857 Young Female Escorts Ser...

Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati

Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.

Russian Call Girls in Dehradun Komal 🔝 7001305949 🔝 📍 Independent Escort Serv...

Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur

Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available

Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7

Call Girls Service Chandigarh Gori WhatsApp ❤7710465962 VIP Call Girls Chandi...

❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...

Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...

#9711199012# African Student Escorts in Delhi 😘 Call Girls Delhi

Leading transformational change: inner and outer skills

Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015

1. Linking Structured and Unstructured Clinical Phenotypes through the OMOP Common Data Model Jon Duke, Charity Hilton, Chris Beesley, Jonathan Cummins

2. Conflicts of Interest • None of the authors report any relevant conflicts of interest

3. Unstructured Data ICD-9 CPTs Meds Labs Structured Data EndoscopyPathology Op Reports Radiology Echo’s Natural Language Processing Traditional Analytics Provider Notes

5. Integrated Phenotyping Combining structured and unstructured data is optimal for many cohort definitions Severe CHF: BNP level + NYHA Class

7. Combining Structured and Unstructured Data is Challenging • If unstructured data are available, often reside in a different data environment with different retrieval mechansims – e.g., HDFS / Lucene vs RDBMS • Unstructured text  structured concept mapping is valuable but may result in data loss (e.g. no UMLS concept for ‘NYHA 3’)

8. Our Goal • Seamless integrated cohort generation using unstructured and structured data sources • Allow efficient exploration of data in either environment on cohorts derived from either source

9. Structured Data: OMOP CDM v5 www.ohdsi.org

10. Unstructured Data: HDFS / Solr / UIMA Environment

11. The Magic Sauce CDM Cohort Table Solr Patient List Core Synchronized Synthetic IDs and Date Perturbations Person_id . . . . Person_id, cohort_start, cohort_end . . . .

12. Create in NLP Send to CDM Synthetic IDs

13. Create in NLP Send to CDM

14. Cohort Shows up in CDM www.github.com/OHDSI/Heracles

15. Can Leverage CDM Tools to Explore

16. Can Leverage CDM Tools to Explore

17. Phase One • We’ve defined a CDM cohort using text-based patient identification • We used a simple search algorithm, but could extend to NLP pipelines as well (e.g., based on negation, context, FHx, value extraction)

18. Create in Cohort CDM Send to NLP www.ohdsi.org/web/circe

19. Shows up in CDM

20. Shows up in CDM

21. Meanwhile, over in NLP…

22. Import Cohort

23. Pulls Subject IDs from Cohort Adds to NLP Patient List

24. Available for NLP Search

25. Available for NLP Search

26. Available for NLP Search

27. Phase Two • We use the CDM Cohort creation tool to build a cohort based on structured data • Any CDM cohort can be consumed and converted into an NLP patient list • We have thus enabled a JOIN between the structured and unstructured criteria

28. Conclusion • Use consistent synthetic identifiers and date- shifting between CDM and Lucene indices • Take advantage of APIs from Solr and OHDSI for cohort insertion • Provide UI hooks for a consistent and integrated user experience • Enable your environment to conduct rich integrated phenotyping and data exploration

29. Questions? jonduke@regenstrief.org

Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015

Similar to Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015 (20)

More from Jon Duke, MD, MS

More from Jon Duke, MD, MS (10)

Recently uploaded

Recently uploaded (20)

Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015