Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts

Exploiting Semantic Structure for Mapping
User-specified Form Terms
to SNOMED CT Concepts
Ritu Khare1,2, Yuan An1, Jiexun Li1, Il-Yeol Song1, Xiaohua Hu1
The iSchool at Drexel1
College of Medicine2
Drexel University, Philadelphia, PA, USA

Presentation Order
1. Motivation
2. Problems
3. Solutions
4. Evaluation
5. Final Remarks

2

General Motivation
 Database Integration and Interoperability
 Semantic Heterogeneity across clinical data sources
(Halevy, 2005, Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999)

?
MRN Med Rec # Medical Record
Number
Blood Diastolic
Pressure Systolic BP
Physical Status
Constitutional Vital Signs

Recommendation: Controlled Medical Vocabularies should
be involved in the design artifacts of the healthcare systems.
(Jean et al., 2007, Sugumaran and Storey, 2002)
3

Specific Motivation

Clinical Encounter Form Electronic Health Records (EHR)

 The terms on the clinical forms are mapped to, or annotated
by, a standard terminology.
 Domain experts may manually perform the annotation
 costly and tedious

Research Objective: Design an automatic tool for mapping
4
form terms to standard terminologies.

1. Motivation
2. Problem
3. Solutions
4. Evaluation
5. Final Remarks

5

The Mapping Problem
Clinical Encounter Form SNOMED CT
 The Systematized Nomenclature of
Medicine - Clinical Terms (Intl.
Health Terminology Stds. Dev. Org)
 Most comprehensive clinical
vocabulary (SNOMED CT User
Guide, 2009).
 >360,000 logically-defined clinical
concepts (Hina et al., 2010,
Stenzhorn et al., 2009).

Form
Term SNOMED CT Concept
Patient 11615400: Patient
(person)
MRN
398225001: Medical
record number
6 (observable entity)

SNOMED CT Concepts
SNOMED CT
concept id: 0231832 Semantic
Categories
Fully-specified-name: Respiratory Rate (Observable Entity)
•Attribute
Preferred Term: Respiratory Rate
•Body Structure
Synonym: Respiration Frequency
•Disorder
•Finding
•Observable Entity
concept id: 362508001
•Occupation
Fully-specified-name: Both eyes, entire (Body Structure)
•Person
Preferred Term: Both eyes, entire
•Physical Object
Synonym: OU- Both eyes
•Procedure
•Racial Group
•Situation
7 •…

SNOMED CT Browsers: (Rogers and Bodenreider, 2008)
Existing Mapping Services

General Mapping
Category Specific Mapping
8

Challenges:
Mapping Form Terms to SNOMED CT Concepts
 Diversity Challenge  Context Challenge
 Different clinicians - different  Same Form Term - Different
terms Concepts.
 MRN, Med. Rec.#
 Vital signs, Constitutional,
Physical status

9

1. Motivation
2. Problem
3. Solution
4. Evaluation
5. Final Remarks

10

Premises
 The first, i.e., the most string-
 The key is to identify the similar, result retrieved by the
SNOMED CT semantic category-specific mapping is
category appropriate for a usually the desired concept.
given term.

How to automatically determine the SNOMED CT Semantic
? Category appropriate for a given form term ?
11

The term context can be derived from the SEMANTIC STUCTURE of
1 the form.

 The FORM TREE accurately captures the semantic intentions of
the designer.
 Inspired by hierarchical modeling of forms (Dragut et al. 2009,
Wu et al. 2009)

12

The implicit relationship between
2 the term context
(i.e., the semantic structure)
and the desired semantic
category Naïve Bayes Classifier
can be formally captured into  Based on the Bayes theorem
a STATISTICAL MODEL. (Han and Kamber 2006).

Procedure  Class Labels (SNOMED CT
Person
root
semantic categories )
 attribute, body structure,
Observable
Entity Patient Examination disorder, …
 Data Attributes (local
Name Gender structure)
Respiratory
Observable  Node type
Entity
 Parent node type
Observable
 Child node Type
Entity M F  Parent Semantic Category
nl
perc.  Grandparent Semantic
Finding Category
Qualifier
Value Qualifier
Value
13

Overall Mapping Approach
Form Tree Training Data

Node Category Semantic SNOMED
Form Structure Attributes
Classificatio Membership Category CT
Category SNOMED CT
Term Analyzer n Model Category
Probabilities Picker
Specific
Concept
Mapping

Procedure
Person
root
Observable
Entity Patient Examination

Name Gender Respiratory
Observable
Observable Entity
Entity

Novelty: Hybrid Approach
(leverages semantic structure as well as term
14 linguistics)

1. Motivation
2. Problem
3. Solution
4. Evaluation
5. Final Remarks

15

Data Manual (Gold)
Annotations
954 (63.55%) terms
Dataset Forms Total Term Concept ID
Terms
Patien 11615400: Patient
1 Walk in clinic encounter 161 t (person)
forms (3 forms) MRN 398225001: Medical
2 Nursing patient 261 record number
admission forms (6 (observable entity)
forms) … ……………….
3 Labor & delivery DB 294
data-entry forms (7
forms) Some Unmapped Terms
4 Adult visit encounter 388
no scleral icterus
forms
(5 forms) chronic back pain
5 Child visit encounter 397 Follow up with PCP
forms
(5 forms) Sent to ER
16
26 Forms 1501

Implementation (JAVA) and Settings

Gold
Form Design Annotations
Interface API, provided by
the Dataline
Form Tree Training Data Software Limited

Category Semantic SNOMED
Form Structure Node Classificatio Membership Category CT
Category SNOMED CT
Term Analyzer Attributes n Model Category
Specific
Concept
Mapping

Cross Validation
17 (leave 1 out) for
each dataset

Goal: To study whether…
Experiment Design semantic structure can improve mapping
performance.
SNOMED
Form CT General SNOMED CT Measures
Term Mapping Concept
Precision # correct annotations/#
Baseline (linguistics annotations
only) Recall # correct annotations/# gold
annotations

Category Semantic SNOMED
Form Structure Node Classificatio Membership Category CT SNOMED CT
Category
Term Analyzer Attributes n Model Category
Specific Concept
Mapping
Hybrid (linguistics + semantic
structure)

Category Category
Semantic SNOMED
Form Structure Node Classificatio Membership Category CT SNOMED CT
Picker
Term Analyzer Attributes n Model +candidate Category
Probabilities Specific Concept
set
expansion Mapping
18 Hybrid++

Mapping Duration
Results /form = 1- 11 s

 Baseline  Recall low:
 Precision: 0.63, Recall: 0.45  SNOMED CT API uses exact
 Baseline to Hybrid string matching
 Precision by 18%.  Couldn’t handle the variation
of terms, i.e., diversity
 Hybrid to Hybrid++ challenge.
 Precision by16% , Recall
by23%
 Hybrid++
19  Precision: 0.86, Recall: 0.55

More Results
 Term processing
component
 remove special characters
 -, #, /, etc.

 acronym expansion
dictionary
 T (Temperature)
 BTL (Bilateral Tubal  Precision only slightly
Litigation) improved
 3-5%
 VTE (Venous
 Recall improved majorly
Thromboembolism)  25%
 Final Precision =0.89, Recall
20 =0.76

Implications
Impact of Semantic Structure
Overall mapping performance
More number of correct predictions (context challenge)

Impact of Linguistics
Majorly on recall
Reaches more number of relevant terms (diversity
challenge)
Overall
Promising performance, even with limited training data
Recall low because of simplicity of linguistic techniques -
can be further improved using sophisticated techniques.

21

1. Motivation
2. Problem
3. Solution
4. Evaluation
5. Final Remarks

22

Contributions
 PROBLEM: NEW problem of standardizing the terms on clinical
encounter forms using SNOMED CT.
 Existing works (Henry et al., 1993, Barrows Jr. et al. 1994,
Patrick et al. 2007)
 standardization of clinical notes: diagnosis, medication
information, patient complaints, etc.
 SOLUTION: Context-based method that leverages SEMANTIC
STRUCTURE of forms along with term linguistics.
 Existing works
 linguistic techniques (synonyms, morphemes, lexical
variants)

23

Contributions
 EVALUATION: 26 healthcare forms containing 950+ mappable
terms specified by multiple clinicians.
 Improvement over existing services
 23% precision, 38% recall
 Promising Performance
 precision: 0.89, recall: 0.76

 FINDINGS:
 Linguistics helps overcome diversity challenge and improve
recall
 Semantic structure helps overcome context challenge and
improves precision and recall.
 Design synergistic hybrid approaches to address all
mapping challenges, and Achieve a superior performance
24

Limitations
 TECHNIQUE  TECHNICAL EVALUATION
 Post coordinated mapping  Compare with other models:
 Handle Missing and  Bayesian networks, k
Inapplicable Values in Neural Networks,
Training data
Classification Association
Rules
 STUDY
 Test the validity of
 Domain Expert Annotator assumptions
 Class conditional
independence
 Correctness of most
linguistic matching
concept
 Classification Attributes
 Compare/Combine with
25
other UMLS terminology

Future Directions
 Fully explore SNOMED  In larger frameworks, does
CT annotation help improve
 Defining relationships  Data/Database Integration
?
 Data Quality ?
 Customize for Form  Patient Diagnosis ?
Categories  User Interventions ?
 Encounter, Regular
Visit,… Work In Progress:
 Larger Knowledge Base for Integrate with flexible Electronic
Training Datasets Health Record system (IHI 2010)
Integration of new forms in EHR
improve database integration
process

26

Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts

Similar to Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts (20)

Recently uploaded

Recently uploaded (20)

Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts

Editor's Notes