Presented at 2014 AMIA Joint Summits, April 9, 2014, San Francisco, CA
BACKGROUND. Pancreatic cancer is one of the most common causes of cancer-related deaths in the United States, it is difficult to detect early and typically has a very poor prognosis. We present a novel method of large-scale clinical hypothesis generation based on phenome wide association study performed using Electronic Health Records (EHR) in a pancreatic cancer cohort. METHODS. The study population consisted of 1,154 patients diagnosed with malignant neoplasm of pancreas seen at The Froedtert & The Medical College of Wisconsin academic medical center between the years 2004 and 2013. We evaluated death of a patient as the primary clinical outcome and tested its association with the phenome, which consisted of over 2.5 million structured clinical observations extracted out of the EHR including labs, medications, phenotypes, diseases and procedures. The individual observations were encoded in the EHR using 6,617 unique ICD-9, CPT-4, LOINC, and RxNorm codes. We remapped this initial code set into UMLS concepts and then hierarchically expanded to support generalization into the final set of 10,164 clinical concepts, which formed the final phenome. We then tested all possible pairwise associations between any of the original 10,164 concepts and death as the primary outcome. RESULTS. After correcting for multiple testing and folding back (generalizing) child concepts were appropriate, we found 231 concepts to be significantly associated with death in the study population.
CONCLUSIONS. With the abundance of structured EHR data, phenome wide association studies combined with
knowledge engineering can be a viable method of rapid hypothesis generation.
2. Conflict of interest disclosure
Tomasz Adamusiak has no real or apparent
conflicts of interest to report
2
3. Learning Objectives
• Recognize the value of structured clinical
information
• Identify computational and terminology
challenges in big data analytics
3
17. Expansion in UMLS across MU sources
17
Diabetes mellitus without
mention of complication,
type II or unspecified
type, not stated as
uncontrolled
ICD-9
ICD-10
SNOMED CT
NDF-RT
Situation
with explicit
context
Metabolic
diseases
roots:
18. 6o of terminological Kevin Bacon
Acute myocardial infarction
Myocardial ischemia
Vascular Diseases
Disorder of soft tissue
Collagen Diseases
Connective Tissue Diseases
Epidermal and dermal conditions
Skin and subcutaneous tissue disorders
Dermatologic disorders
18
19. UMLS is ideal for integration of
heterogeneous clinical data
• Translational potential (OMIM, GO, NCIt)
• Single entry point to MU terminologies
• Cross-walk between MU terms
• Terminology-agnostic
• Text-mining
19
20. Extracting genetic information out of
EHR is a major challenge
Encounter due to genetic
counseling
Yes No
Outcome
Deceased 2 813
Alive 3 336
20
Background
reference
Methods:
• Chi-squared test
• Bonferroni correction
• RR estimate of effect size
22. CORRELATION DOES NOT IMPLY
CAUSATION
Private traits and attributes are predictable from digital records of
human behavior. Kosinski M1, Stillwell D, Graepel T. PMID: 23479631
22
By Jono Winn (Flickr) [CC-BY-2.0], via Wikimedia Commons
23. Future work: cohort profiles
1. Malignant neoplasm of pancreas (C0346647)
2. Digestive System Neoplasms (C0012243)
3. Glucose test, blood by glucose monitoring device(s) cleared by the FDA
specifically for home use (C0373627)
4. Hepatic function panel This panel must include the following: Albumin (82040)
Bilirubin, total (82247) Bilirubin, direct (82248) Phosphatase, alkaline (84075)
Protein, total (84155) Transferase, alanine amino (ALT) (SGPT) (84460)
Transferase, aspartate amino (AST) (SGOT) (C0812554)
5. Basic metabolic panel (Calcium, total) This panel must include the following:
Calcium, total (82310) Carbon dioxide (bicarbonate) (82374) Chloride (82435)
Creatinine (82565) Glucose (82947) Potassium (84132) Sodium (84295) Urea
nitrogen (BUN) (84520) (C0519823)
6. Regular Insulin, Human 100 UNT/ML Injectable Solution (C0977794)
7. heparin sodium, porcine 10 UNT/ML Injectable Solution (C0977415)
8. Pancreatic Diseases (C0030286)
9. Dexamethasone 4 MG/ML Injectable Solution (C0976136)
10. Sodium Chloride 0.154 MEQ/ML Injectable Solution (C0980221)
23
24. Limitations
• Gaps in data
– Out of network
– Provider-related
– Terminology-related
24
25. Thank you
Co-authors:
Mary Shimoyama, PhD
tomasz@mcw.edu
@7omasz
25
Results
http://dx.doi.org/10.6084/m9.figshare.816958
For more background information
Next-generation phenotyping using the Unified
Medical Language System (UMLS). Adamusiak T,
Shimoyama N, Shimoyama M, JMIR Med Inform.
doi:10.2196/medinform.3172
Acknowledgements
We thank Stacy Zacher, Glenn Bushee, and Bradley
Taylor for their help.
This project was funded in part by the Advancing a
Healthier Wisconsin endowment at the Medical
College of Wisconsin and the National Center for
Research Resources and the National Center for
Advancing Translational Sciences, National Institutes
of Health, through grant UL1 RR031973.