Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to SNOMED CT Concepts
Exploiting Semantic Structure for Mapping Clinician‐specified Form Terms to SNOMED CT Concepts Ritu Khare1,3, Yuan An3, Jiexun Li3, Il‐Yeol Song3, Xiaohua Hu3, Michele Follen1,2 , Yuan An Jiexun Il‐Yeol Xiaohua Michele Follen , College of Medicine Center for Women’s Health Research 1, and Obstetrics and Gynecology2 , College of Information Science and Technology3 Motivation, Problem, and Challenges Structure‐based SNOMED‐CT Mapping FrameworkThe elements of clinical databases are usually named after the clinical termsused in various design artifacts. These terms are instinctively supplied by the Form Semantic Semanticusers, and hence, different users often use different terms to describe the same X Information Training Data Y Form Treeclinical concept. This term diversity makes future database integration and Extractionanalysis a huge challenge. Semantic SNOMED CT Semantic Structure –based Category Terms Category SNOMED CT SNOMED Form Term Structure Classification Specific (in Clinical Picker Concept Mapping/ CT Analyzer Model Mapping (API) Forms) (configurable) Standardization Concepts Fig. 3. Overall Mapping Framework: (1) The form tree structure is analyzed to derive the form context, (2) The Patient History Form Diversity Challenge Context Challenge classification model (Naïve Bayes) ranks the SNOMED CT semantic categories suitable for the form context, (3) A PATIENT (Well Addressed) (Less Explored) category is picked, (4) The most linguistically matching concept in this category is selected as the winner concept. Name: Gender: M F Different clinicians The same form Exploit the local semantic structure of form tree Select a winner semantic category , and map the DOB: MRN: specify different term when used in term to the linguistically matching concept within to determine the term context, and candidate Key Ideas HISTORY form terms to different contexts, SNOMED CT semantic categories. the determined semantic category. Chief specify the same may map to Complaints l l clinical concept. d ff different SNOMED How H can we Review of Systems: e.g., CT concepts. leverage the Results and Contributions Eyes MRN, or Med.Rec.#. e.g., the term semantic structure Empirical Study with Clinician‐designed Forms ENMT VitalSigns, Respiratory in Fig. 1 Future Work Respiratory Constitutional, or and 2. of clinical forms to About the Data About the Methods Leverage other relationships of Physical status map the form terms The data includes 26 forms collected from 5 BASELINE: Linguistic comparison SNOMED CT and test with other Fig 1. A Sample Clinician Designed Form into standard healthcare institutions. The forms contain HYBRID: Linguistic as well as vocabularies from the UMLS. SNOMED CT over 1500 terms, out of which 954 (63%) are Structural (Contextual) Test within larger frameworks mappable to SNOMED CT concepts. comparison (See Fig. 3) of health information systems. Preliminaries: SNOMED CT and Semantic Form Trees Preliminaries: SNOMED CT and Semantic Form Trees concepts? Mapping Precision HYBRID++: Linguistic as well as Apply other classificationThe Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is a 0.89 0.92 0.87 advanced structural comparison techniques and employ 0.89 0.84widely used medical terminology. It comprises 360,000 clinical CONCEPTS 0.76 0.73 0.78 0.72 sophisticated linguistic 0.69belonging to various SEMANTIC CATEGORIES. Each concept is represented using 0.63 0.64 0.65 0.66 techniques.a CONCEPT ID and a FULLY SPECIFIED NAME. A simple search for the term Eyes 0.51 Findings Implicationsacross the UMLS SNOMED CT browser leads to the following top results: Improvement due to 90 Structural Knowledge Concept Id Fully‐specified Name Semantic Category Precision structure (Fig 4) has the ability to 80 63342001 Sunsetting eyes Finding (R = recall, P=Precision) address the context Conclusion Recall 371110006 Immature eyes Disorder Set1 Set2 Set3 Set4 Set5 70 Hybrid over Baseline: challenge, and 18% (P); 2%(R) It is desirable to 362508001 Both eyes, entire Body Structure Baseline Hybrid Hybrid++ 60 Precision improve the overall develop hybrid Observable with Term Hybrid++ over Hybrid: mapping Person Procedure Entity 50 Processing Recall with 16% (P); 23%(R) approaches that can Patient Examination Form root 0.74 Mapping Recall Term performance. address both the 0.69 40 Processing PATIENT Observable 0.57 Baseline Hybrid Hybrid++ Improvement due to Linguistic Techniques challenges & lead to a Name: Observable Patient Examination Entity Entity 0.52 0.49 0.51 0.52 Linguistics (Fig 5) can improve the recall superior performance 0.43 0.43 0.43 0.43 0.45 0.43 Fig 5. Change in Results with Gender: M F T Respiratory 0.37 2‐3% (P), >30%(R) and address the Name Gender 0.31 the term processing, EXAMINATION diversity challenge to a advanced linguistic technique T large extent. Respiratory Observable Symmetric chest Entity symm. nl perc. Acknowledgements M F Set1 Set2 Set3 Set4 Set5 expansion Qualifier expan. Normal Percussion Value Qualifier Fig 4. Mapping Results for 3 Methods National Cancer Institute (National Biomedical Imaging Branch): Grant #P01‐CA‐82710‐09 Value Finding Finding National Science Foundation Grants: NSF CCF 0905291, NSF CCF 1049864, and NSFC 90920005 Fig. 2. A clinical form and its equivalent Semantic Form Tree. Each node in the tree is tagged with SNOMED CT semantic categories.