Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts


Published on

Published in: Technology, Education, Business
  • Be the first to comment

Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts

  1. 1. Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts Ritu Khare1,2, Yuan An1, Jiexun Li1, Il-Yeol Song1, Xiaohua Hu1 The iSchool at Drexel1 College of Medicine2 Drexel University, Philadelphia, PA, USA
  2. 2. Presentation Order 1. Motivation 2. Problems 3. Solutions 4. Evaluation 5. Final Remarks2
  3. 3. General Motivation  Database Integration and Interoperability  Semantic Heterogeneity across clinical data sources (Halevy, 2005, Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999) ? MRN Med Rec # Medical Record Number Blood Diastolic Pressure Systolic BP Physical Status Constitutional Vital Signs Recommendation: Controlled Medical Vocabularies should be involved in the design artifacts of the healthcare systems. (Jean et al., 2007, Sugumaran and Storey, 2002)3
  4. 4. Specific Motivation Clinical Encounter Form Electronic Health Records (EHR)  The terms on the clinical forms are mapped to, or annotated by, a standard terminology.  Domain experts may manually perform the annotation  costly and tedious Research Objective: Design an automatic tool for mapping4 form terms to standard terminologies.
  5. 5. 1. Motivation 2. Problem 3. Solutions 4. Evaluation 5. Final Remarks5
  6. 6. The Mapping Problem Clinical Encounter Form SNOMED CT  The Systematized Nomenclature of Medicine - Clinical Terms (Intl. Health Terminology Stds. Dev. Org)  Most comprehensive clinical vocabulary (SNOMED CT User Guide, 2009).  >360,000 logically-defined clinical concepts (Hina et al., 2010, Stenzhorn et al., 2009). Form Term SNOMED CT Concept Patient 11615400: Patient (person) MRN 398225001: Medical record number6 (observable entity)
  7. 7. SNOMED CT Concepts SNOMED CT concept id: 0231832 Semantic Categories Fully-specified-name: Respiratory Rate (Observable Entity) •Attribute Preferred Term: Respiratory Rate •Body Structure Synonym: Respiration Frequency •Disorder •Finding •Observable Entity concept id: 362508001 •Occupation Fully-specified-name: Both eyes, entire (Body Structure) •Person Preferred Term: Both eyes, entire •Physical Object Synonym: OU- Both eyes •Procedure •Racial Group •Situation7 •…
  8. 8. SNOMED CT Browsers: (Rogers and Bodenreider, 2008)Existing Mapping Services General Mapping Category Specific Mapping8
  9. 9. Challenges: Mapping Form Terms to SNOMED CT Concepts  Diversity Challenge  Context Challenge  Different clinicians - different  Same Form Term - Different terms Concepts.  MRN, Med. Rec.#  Vital signs, Constitutional, Physical status9
  10. 10. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks10
  11. 11. Premises  The first, i.e., the most string-  The key is to identify the similar, result retrieved by the SNOMED CT semantic category-specific mapping is category appropriate for a usually the desired concept. given term. How to automatically determine the SNOMED CT Semantic ? Category appropriate for a given form term ?11
  12. 12. The term context can be derived from the SEMANTIC STUCTURE of 1 the form.  The FORM TREE accurately captures the semantic intentions of the designer.  Inspired by hierarchical modeling of forms (Dragut et al. 2009, Wu et al. 2009)12
  13. 13. The implicit relationship between2 the term context (i.e., the semantic structure) and the desired semantic category Naïve Bayes Classifier can be formally captured into  Based on the Bayes theorem a STATISTICAL MODEL. (Han and Kamber 2006). Procedure  Class Labels (SNOMED CT Person root semantic categories )  attribute, body structure,Observable Entity Patient Examination disorder, …  Data Attributes (local Name Gender structure) Respiratory Observable  Node type Entity  Parent node type Observable  Child node Type Entity M F  Parent Semantic Category nl perc.  Grandparent Semantic Finding Category Qualifier Value Qualifier Value13
  14. 14. Overall Mapping Approach Form Tree Training Data Node Category Semantic SNOMEDForm Structure Attributes Classificatio Membership Category CT Category SNOMED CTTerm Analyzer n Model Category Probabilities Picker Specific Concept Mapping Procedure Person root Observable Entity Patient Examination Name Gender Respiratory Observable Observable Entity Entity Novelty: Hybrid Approach (leverages semantic structure as well as term 14 linguistics)
  15. 15. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks15
  16. 16. Data Manual (Gold) Annotations 954 (63.55%) terms Dataset Forms Total Term Concept ID Terms Patien 11615400: Patient 1 Walk in clinic encounter 161 t (person) forms (3 forms) MRN 398225001: Medical 2 Nursing patient 261 record number admission forms (6 (observable entity) forms) … ………………. 3 Labor & delivery DB 294 data-entry forms (7 forms) Some Unmapped Terms 4 Adult visit encounter 388 no scleral icterus forms (5 forms) chronic back pain 5 Child visit encounter 397 Follow up with PCP forms (5 forms) Sent to ER16 26 Forms 1501
  17. 17. Implementation (JAVA) and Settings Gold Form Design Annotations Interface API, provided by the Dataline Form Tree Training Data Software Limited Category Semantic SNOMEDForm Structure Node Classificatio Membership Category CT Category SNOMED CTTerm Analyzer Attributes n Model Category Probabilities Picker Specific Concept Mapping Cross Validation 17 (leave 1 out) for each dataset
  18. 18. Goal: To study whether…Experiment Design semantic structure can improve mapping performance. SNOMED Form CT General SNOMED CT Measures Term Mapping Concept Precision # correct annotations/# Baseline (linguistics annotations only) Recall # correct annotations/# gold annotations Category Semantic SNOMEDForm Structure Node Classificatio Membership Category CT SNOMED CT CategoryTerm Analyzer Attributes n Model Category Probabilities Picker Specific Concept Mapping Hybrid (linguistics + semantic structure) Category Category Semantic SNOMEDForm Structure Node Classificatio Membership Category CT SNOMED CT PickerTerm Analyzer Attributes n Model +candidate Category Probabilities Specific Concept set expansion Mapping18 Hybrid++
  19. 19. Mapping Duration Results /form = 1- 11 s  Baseline  Recall low:  Precision: 0.63, Recall: 0.45  SNOMED CT API uses exact  Baseline to Hybrid string matching  Precision by 18%.  Couldn’t handle the variation of terms, i.e., diversity  Hybrid to Hybrid++ challenge.  Precision by16% , Recall by23%  Hybrid++19  Precision: 0.86, Recall: 0.55
  20. 20. More Results  Term processing component  remove special characters  -, #, /, etc.  acronym expansion dictionary  T (Temperature)  BTL (Bilateral Tubal  Precision only slightly Litigation) improved  3-5%  VTE (Venous  Recall improved majorly Thromboembolism)  25%  Final Precision =0.89, Recall20 =0.76
  21. 21. Implications Impact of Semantic Structure Overall mapping performance More number of correct predictions (context challenge) Impact of Linguistics Majorly on recall Reaches more number of relevant terms (diversity challenge) Overall Promising performance, even with limited training data Recall low because of simplicity of linguistic techniques - can be further improved using sophisticated techniques.21
  22. 22. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks22
  23. 23. Contributions  PROBLEM: NEW problem of standardizing the terms on clinical encounter forms using SNOMED CT.  Existing works (Henry et al., 1993, Barrows Jr. et al. 1994, Patrick et al. 2007)  standardization of clinical notes: diagnosis, medication information, patient complaints, etc.  SOLUTION: Context-based method that leverages SEMANTIC STRUCTURE of forms along with term linguistics.  Existing works  linguistic techniques (synonyms, morphemes, lexical variants)23
  24. 24. Contributions  EVALUATION: 26 healthcare forms containing 950+ mappable terms specified by multiple clinicians.  Improvement over existing services  23% precision, 38% recall  Promising Performance  precision: 0.89, recall: 0.76  FINDINGS:  Linguistics helps overcome diversity challenge and improve recall  Semantic structure helps overcome context challenge and improves precision and recall.  Design synergistic hybrid approaches to address all mapping challenges, and Achieve a superior performance24
  25. 25. Limitations  TECHNIQUE  TECHNICAL EVALUATION  Post coordinated mapping  Compare with other models:  Handle Missing and  Bayesian networks, k Inapplicable Values in Neural Networks, Training data Classification Association Rules  STUDY  Test the validity of  Domain Expert Annotator assumptions  Class conditional independence  Correctness of most linguistic matching concept  Classification Attributes  Compare/Combine with25 other UMLS terminology
  26. 26. Future Directions  Fully explore SNOMED  In larger frameworks, does CT annotation help improve  Defining relationships  Data/Database Integration ?  Data Quality ?  Customize for Form  Patient Diagnosis ? Categories  User Interventions ?  Encounter, Regular Visit,… Work In Progress:  Larger Knowledge Base for Integrate with flexible Electronic Training Datasets Health Record system (IHI 2010) Integration of new forms in EHR improve database integration process26
  27. 27. Thank you27