Leading transformational change: inner and outer skills
Linking Structured and Unstructured Phenotypes through the OMOP Common Data Model - Duke - AMIA CRI 2015
1. Linking Structured and Unstructured
Clinical Phenotypes through the
OMOP Common Data Model
Jon Duke, Charity Hilton, Chris
Beesley, Jonathan Cummins
7. Combining Structured and
Unstructured Data is Challenging
• If unstructured data are available, often reside
in a different data environment with different
retrieval mechansims
– e.g., HDFS / Lucene vs RDBMS
• Unstructured text structured concept
mapping is valuable but may result in data loss
(e.g. no UMLS concept for ‘NYHA 3’)
8. Our Goal
• Seamless integrated cohort generation using
unstructured and structured data sources
• Allow efficient exploration of data in either
environment on cohorts derived from either
source
17. Phase One
• We’ve defined a CDM cohort using text-based
patient identification
• We used a simple search algorithm, but could
extend to NLP pipelines as well (e.g., based on
negation, context, FHx, value extraction)
27. Phase Two
• We use the CDM Cohort creation tool to build
a cohort based on structured data
• Any CDM cohort can be consumed and
converted into an NLP patient list
• We have thus enabled a JOIN between the
structured and unstructured criteria
28. Conclusion
• Use consistent synthetic identifiers and date-
shifting between CDM and Lucene indices
• Take advantage of APIs from Solr and OHDSI
for cohort insertion
• Provide UI hooks for a consistent and
integrated user experience
• Enable your environment to conduct rich
integrated phenotyping and data exploration