Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts


Published on

Published in: Technology, Education, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 25 min presentation – 5 min question answer. Make 20 slides only. Read reviewers comments. Breakdown – 2, 4, 5, 5, 4
  • (In other words, we could say that existing systems are certainly not designed with future integration in mind.)
  • Who designed the forms? Why not other domains – which other domains? Possible. Have some idea. Mark the concepts – post coordinated or partial mapping.
  • Draw all the figures properly in MS 2010 ppt.
  • Why does recall decrease – when number of correct predictions decrease on applying the hybrid method. Sometime linguitic approach returns more accurate result. More improvement in recall, and precision means forms had those terms whose multiple senses exist in SNOMED CT
  • Our experience of tagging 52 data-entry forms suggests that the training samples can be constructed quickly and easily, as compared to the construction of exhaustive set of rules or heuristicsTo further test the performance of the mapping framework in a heterogeneous environment,
  • Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts

    1. 1. Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts Ritu Khare1,2, Yuan An1, Jiexun Li1, Il-Yeol Song1, Xiaohua Hu1 The iSchool at Drexel1 College of Medicine2 Drexel University, Philadelphia, PA, USA
    2. 2. Presentation Order 1. Motivation 2. Problems 3. Solutions 4. Evaluation 5. Final Remarks2
    3. 3. General Motivation  Database Integration and Interoperability  Semantic Heterogeneity across clinical data sources (Halevy, 2005, Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999) ? MRN Med Rec # Medical Record Number Blood Diastolic Pressure Systolic BP Physical Status Constitutional Vital Signs Recommendation: Controlled Medical Vocabularies should be involved in the design artifacts of the healthcare systems. (Jean et al., 2007, Sugumaran and Storey, 2002)3
    4. 4. Specific Motivation Clinical Encounter Form Electronic Health Records (EHR)  The terms on the clinical forms are mapped to, or annotated by, a standard terminology.  Domain experts may manually perform the annotation  costly and tedious Research Objective: Design an automatic tool for mapping4 form terms to standard terminologies.
    5. 5. 1. Motivation 2. Problem 3. Solutions 4. Evaluation 5. Final Remarks5
    6. 6. The Mapping Problem Clinical Encounter Form SNOMED CT  The Systematized Nomenclature of Medicine - Clinical Terms (Intl. Health Terminology Stds. Dev. Org)  Most comprehensive clinical vocabulary (SNOMED CT User Guide, 2009).  >360,000 logically-defined clinical concepts (Hina et al., 2010, Stenzhorn et al., 2009). Form Term SNOMED CT Concept Patient 11615400: Patient (person) MRN 398225001: Medical record number6 (observable entity)
    7. 7. SNOMED CT Concepts SNOMED CT concept id: 0231832 Semantic Categories Fully-specified-name: Respiratory Rate (Observable Entity) •Attribute Preferred Term: Respiratory Rate •Body Structure Synonym: Respiration Frequency •Disorder •Finding •Observable Entity concept id: 362508001 •Occupation Fully-specified-name: Both eyes, entire (Body Structure) •Person Preferred Term: Both eyes, entire •Physical Object Synonym: OU- Both eyes •Procedure •Racial Group •Situation7 •…
    8. 8. SNOMED CT Browsers: (Rogers and Bodenreider, 2008)Existing Mapping Services General Mapping Category Specific Mapping8
    9. 9. Challenges: Mapping Form Terms to SNOMED CT Concepts  Diversity Challenge  Context Challenge  Different clinicians - different  Same Form Term - Different terms Concepts.  MRN, Med. Rec.#  Vital signs, Constitutional, Physical status9
    10. 10. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks10
    11. 11. Premises  The first, i.e., the most string-  The key is to identify the similar, result retrieved by the SNOMED CT semantic category-specific mapping is category appropriate for a usually the desired concept. given term. How to automatically determine the SNOMED CT Semantic ? Category appropriate for a given form term ?11
    12. 12. The term context can be derived from the SEMANTIC STUCTURE of 1 the form.  The FORM TREE accurately captures the semantic intentions of the designer.  Inspired by hierarchical modeling of forms (Dragut et al. 2009, Wu et al. 2009)12
    13. 13. The implicit relationship between2 the term context (i.e., the semantic structure) and the desired semantic category Naïve Bayes Classifier can be formally captured into  Based on the Bayes theorem a STATISTICAL MODEL. (Han and Kamber 2006). Procedure  Class Labels (SNOMED CT Person root semantic categories )  attribute, body structure,Observable Entity Patient Examination disorder, …  Data Attributes (local Name Gender structure) Respiratory Observable  Node type Entity  Parent node type Observable  Child node Type Entity M F  Parent Semantic Category nl perc.  Grandparent Semantic Finding Category Qualifier Value Qualifier Value13
    14. 14. Overall Mapping Approach Form Tree Training Data Node Category Semantic SNOMEDForm Structure Attributes Classificatio Membership Category CT Category SNOMED CTTerm Analyzer n Model Category Probabilities Picker Specific Concept Mapping Procedure Person root Observable Entity Patient Examination Name Gender Respiratory Observable Observable Entity Entity Novelty: Hybrid Approach (leverages semantic structure as well as term 14 linguistics)
    15. 15. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks15
    16. 16. Data Manual (Gold) Annotations 954 (63.55%) terms Dataset Forms Total Term Concept ID Terms Patien 11615400: Patient 1 Walk in clinic encounter 161 t (person) forms (3 forms) MRN 398225001: Medical 2 Nursing patient 261 record number admission forms (6 (observable entity) forms) … ………………. 3 Labor & delivery DB 294 data-entry forms (7 forms) Some Unmapped Terms 4 Adult visit encounter 388 no scleral icterus forms (5 forms) chronic back pain 5 Child visit encounter 397 Follow up with PCP forms (5 forms) Sent to ER16 26 Forms 1501
    17. 17. Implementation (JAVA) and Settings Gold Form Design Annotations Interface API, provided by the Dataline Form Tree Training Data Software Limited Category Semantic SNOMEDForm Structure Node Classificatio Membership Category CT Category SNOMED CTTerm Analyzer Attributes n Model Category Probabilities Picker Specific Concept Mapping Cross Validation 17 (leave 1 out) for each dataset
    18. 18. Goal: To study whether…Experiment Design semantic structure can improve mapping performance. SNOMED Form CT General SNOMED CT Measures Term Mapping Concept Precision # correct annotations/# Baseline (linguistics annotations only) Recall # correct annotations/# gold annotations Category Semantic SNOMEDForm Structure Node Classificatio Membership Category CT SNOMED CT CategoryTerm Analyzer Attributes n Model Category Probabilities Picker Specific Concept Mapping Hybrid (linguistics + semantic structure) Category Category Semantic SNOMEDForm Structure Node Classificatio Membership Category CT SNOMED CT PickerTerm Analyzer Attributes n Model +candidate Category Probabilities Specific Concept set expansion Mapping18 Hybrid++
    19. 19. Mapping Duration Results /form = 1- 11 s  Baseline  Recall low:  Precision: 0.63, Recall: 0.45  SNOMED CT API uses exact  Baseline to Hybrid string matching  Precision by 18%.  Couldn’t handle the variation of terms, i.e., diversity  Hybrid to Hybrid++ challenge.  Precision by16% , Recall by23%  Hybrid++19  Precision: 0.86, Recall: 0.55
    20. 20. More Results  Term processing component  remove special characters  -, #, /, etc.  acronym expansion dictionary  T (Temperature)  BTL (Bilateral Tubal  Precision only slightly Litigation) improved  3-5%  VTE (Venous  Recall improved majorly Thromboembolism)  25%  Final Precision =0.89, Recall20 =0.76
    21. 21. Implications Impact of Semantic Structure Overall mapping performance More number of correct predictions (context challenge) Impact of Linguistics Majorly on recall Reaches more number of relevant terms (diversity challenge) Overall Promising performance, even with limited training data Recall low because of simplicity of linguistic techniques - can be further improved using sophisticated techniques.21
    22. 22. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks22
    23. 23. Contributions  PROBLEM: NEW problem of standardizing the terms on clinical encounter forms using SNOMED CT.  Existing works (Henry et al., 1993, Barrows Jr. et al. 1994, Patrick et al. 2007)  standardization of clinical notes: diagnosis, medication information, patient complaints, etc.  SOLUTION: Context-based method that leverages SEMANTIC STRUCTURE of forms along with term linguistics.  Existing works  linguistic techniques (synonyms, morphemes, lexical variants)23
    24. 24. Contributions  EVALUATION: 26 healthcare forms containing 950+ mappable terms specified by multiple clinicians.  Improvement over existing services  23% precision, 38% recall  Promising Performance  precision: 0.89, recall: 0.76  FINDINGS:  Linguistics helps overcome diversity challenge and improve recall  Semantic structure helps overcome context challenge and improves precision and recall.  Design synergistic hybrid approaches to address all mapping challenges, and Achieve a superior performance24
    25. 25. Limitations  TECHNIQUE  TECHNICAL EVALUATION  Post coordinated mapping  Compare with other models:  Handle Missing and  Bayesian networks, k Inapplicable Values in Neural Networks, Training data Classification Association Rules  STUDY  Test the validity of  Domain Expert Annotator assumptions  Class conditional independence  Correctness of most linguistic matching concept  Classification Attributes  Compare/Combine with25 other UMLS terminology
    26. 26. Future Directions  Fully explore SNOMED  In larger frameworks, does CT annotation help improve  Defining relationships  Data/Database Integration ?  Data Quality ?  Customize for Form  Patient Diagnosis ? Categories  User Interventions ?  Encounter, Regular Visit,… Work In Progress:  Larger Knowledge Base for Integrate with flexible Electronic Training Datasets Health Record system (IHI 2010) Integration of new forms in EHR improve database integration process26
    27. 27. Thank you27