With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit entity mentions and introduce the problem of implicit entity recognition (IER) in clinical documents. We propose a solution to IER that leverages entity definitions from a knowledge base to create entity models, projects sentences to the entity models and identifies implicit entity mentions by evaluating semantic similarity between sentences and entity models. The evaluation with 857 sentences selected for 8 different entities shows that our algorithm outperforms the most closely related unsupervised solution. The similarity value calculated by our algorithm proved to be an effective feature in a supervised learning setting, helping it to improve over the baselines, and achieving F1 scores of .81 and .73 for different classes of implicit mentions. Our gold standard annotations are made available to encourage further research in the area of IER.
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Implicit Entity Recognition in Clinical Documents
1. 1
Implicit Entity Recognition in Clinical Documents
Sujan Perera1, Pablo Mendes2, Amit Sheth1, Krishnaprasad
Thirunarayan1, Adarsh Alex1, Christopher Heid3, Greg Mott3
1Kno.e.sis Center, Wright State University, 2IBM Research, San Jose,
3Boonshoft School of Medicine, Wight State University
2. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Example
2Implicit Entity Recognition in Clinical Documents
3. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Named Entity Recognition
Person Person
Example
3Implicit Entity Recognition in Clinical Documents
4. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Named Entity Recognition
Entity Linking
Person Person C0018795
C0015672
C0008031
Example
4Implicit Entity Recognition in Clinical Documents
5. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Named Entity Recognition
Entity Linking
Co-reference Resolution
Person Person C0018795
C0015672
C0008031
Example
5Implicit Entity Recognition in Clinical Documents
6. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Named Entity Recognition
Entity Linking
Co-reference Resolution
Negation Detection
Person Person C0018795
C0015672
C0008031
Example
6Implicit Entity Recognition in Clinical Documents
7. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Example
7Implicit Entity Recognition in Clinical Documents
8. “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Example
Shortness of Breath (NEG)
Edema
Implicit Entity Recognition
8Implicit Entity Recognition in Clinical Documents
9. Sentence Entity
“Rounded calcific density in right upper quadrant likely representing
a gallstone within the neck of the gallbladder.”
“His tip of the appendix is inflamed.”
“The respirations were unlabored and there were no use of
accessory muscles.”
“She was walking outside on her driveway and suddenly fell
unconcious, with no prodrome, or symptoms preceding the event.”
“This is important to prevent shortness of breath and lower
extremity swelling from fluid accumulation.”
More Examples
Cholecystitis
Appendicitis
Shortness of
breath (NEG)
Syncope
Edema
9Implicit Entity Recognition in Clinical Documents
10. Implicit Entity Recognition
Implicit Entity Recognition (IER) is the task of
determining whether a sentence has a reference
to an entity, even though it does not mention
that entity by its name.
10Implicit Entity Recognition in Clinical Documents
11. Automation of clinical documents
• New healthcare policies : automation required
• State-of-art approaches focus on explicit mentions.
• The overall understanding about the patients record needs:
• Explicit/implicit facts
• Domain knowledge
• Some conditions are frequently mentioned implicitly.
• 40% of shortness of breath mentions.
• 35% of edema mentions.
CAC CDI
Readmission
Prediction
Assisting
Professionals
11Implicit Entity Recognition in Clinical Documents
12. What is involved in solving the problem?
“At the time of discharge she was breathing comfortably with a respiratory
rate of 12 to 15 breaths per minute.”
“Rounded calcific density in right upper quadrant likely representing a
gallstone within the neck of the gallbladder.”
• Language understanding.
Term ‘comfortable’ is the antonym of ‘uncomfortable’
• Domain knowledge.
‘gallstones blocking the tube leading out of your gallbladder
cause cholecystitis’
Shortness of breath (NEG)
Cholecystitis (POS)
12Implicit Entity Recognition in Clinical Documents
13. Our Solution
Candidate Sentence
Selection
Candidate Sentence
Pruning
Similarity Calculation
Annotations
Entity Representative Term Selection
Entity Model Creation
Implicit Entity Recognition in Clinical Documents
14. ERT Selection
• The knowledge base consists of definitions of the entities.
• Entity Representative Terms may indicate the implicit mentions of
the entities.
• The representative power of a term for an entity is calculated by its
TF-IDF value.
rt is the representative power of the term t for the entity e, freq(t,Qe) is the
frequency of the term t in the definitions of e, E is the total number of
entities, Et is the number of entities defined using term t.
breathing
fluid
gallstone
Shortness of breath
edema
cholecystitis
Implicit Entity Recognition in Clinical Documents
15. Entity Model
• Entity Indicator.
• Entity Indicator consists of the terms that describe features of
the entity in the definition.
• E.g., ‘A disorder characterized by an uncomfortable sensation of
difficulty breathing’ – {uncomfortable, sensation, difficulty,
breathing}.
• Entity Model – collection of entity indicators
Entity Model
Entity Indicator1
Entity Indicator3
Entity Indicator2
16Implicit Entity Recognition in Clinical Documents
16. Candidate Sentence Selection & Pruning
• Candidate sentences – sentences with ERT.
• Candidate sentences are pruned to remove the noise.
• Selected nouns, verbs, adjectives and adverbs within the fixed
window size from the ERT of the sentence.
“His propofol was increased and he was allowed to wake up a second time
later on the evening of surgery and was ultimately weaned from mechanical
ventilation and successfully extubated at about 09:30 that evening.”
{weaned, mechanical, ventilation, successfully, extubated}
pruning
17Implicit Entity Recognition in Clinical Documents
17. Similarity Calculation
• The similarity between entity model and pruned candidate sentence
is calculated to annotate the sentence.
• The syntactic diversity of the words and the negated mentions need
special attention.
• Multiple similarity measures are used.
t1 and t2 are the words, M is set of similarity measures. M = {WUP, LCH, LIN,
JCN, Word2Vec, Levenshtein}
Implicit Entity Recognition in Clinical Documents
18. Similarity Calculation
• The similarity between entity model and the pruned sentence is
calculated by weighting the maximum similarity of each word in the
entity model by its representative power.
e – entity indicator
s – pruned sentence
α(te, s) – determines if term t in e
is antonym of any term in s.
Implicit Entity Recognition in Clinical Documents
19. Similarity Calculation
• The similarity between entity model and the pruned sentence is
calculated by weighting the maximum similarity of each word in the
entity model by its representative power.
e – entity indicator
s – pruned sentence
α(te, s) – determines if term t in e
is antonym of any term in s.
f(te, s) – calculates the similarity
of term in e with the terms in
sentence.
Implicit Entity Recognition in Clinical Documents
20. Similarity Calculation
• The similarity between entity model and the pruned sentence is
calculated by weighting the maximum similarity of each word in the
entity model by its representative power.
e – entity indicator
s – pruned sentence
α(te, s) – determines if term t in e
is antonym of any term in s.
f(te, s) – calculates the similarity
of term in e with the terms in
sentence.
sim(e, s) – measures the similarity
between entity indicator and the
pruned sentence.
Implicit Entity Recognition in Clinical Documents
21. Dataset
• Used the dataset used by SemEval-2014 task 7.
• 857 sentences selected for 8 entities.
• The entities are selected based on the frequency of their
appearance and feedback from domain experts.
• Annotated by three domain experts.
• Annotation agreement 0.58.
Implicit Entity Recognition in Clinical Documents
23. Evaluation
• Baselines
• MCS algorithm (Mihalcea 2006)
• SVM (trained on n-grams)
• Evaluation metrics
• Positive Precision and recall
• Negative Precision and recall
• 70% training and 30% testing
• Threshold selection for our algorithm and MCS
• Thresholds were selected based on the annotation performance
in the training dataset
Implicit Entity Recognition in Clinical Documents
24. Annotation Performance
Method PP PR PF1 NP NR NF1
Our 0.66 0.87 0.75 0.73 0.73 0.73
MCS 0.50 0.93 0.65 0.31 0.76 0.44
SVM 0.73 0.82 0.77 0.66 0.67 0.67
• Our algorithm outperforms baselines in negative category.
• SVM is able to leverage the supervision to beat our algorithm in
positive category.
Implicit Entity Recognition in Clinical Documents
25. Annotation Performance
Method PP PR PF1 NP NR NF1
SVM 0.73 0.82 0.77 0.66 0.67 0.67
SVM+MCS 0.73 0.82 0.77 0.66 0.66 0.66
SVM+Our 0.77 0.85 0.81 0.72 0.75 0.73
• The similarity value of our algorithm as a feature to the SVM.
• This proves our similarity value can be used as an effective feature
with a supervised approach.
Implicit Entity Recognition in Clinical Documents
26. Annotation Performance with
varying training dataset size
Positive Assertions Negative Assertions
Implicit Entity Recognition in Clinical Documents
27. Limitations
• The approach misses the implicit mentions of entities with no ERT.
• Implicit mentions of shortness of breath without the term
‘breathing’
• “The patient had low oxygen saturation”
• “The patient was gasping for air”
• “Patient was air hunger”
• 113 instances vs 8990 instances
Implicit Entity Recognition in Clinical Documents
28. Conclusion
• Introduced the problem of implicit entity recognition in clinical
documents.
• Developed a unsupervised approach and showed that it
outperforms supervised approach.
• Proved that supervised approach can use our similarity value as a
feature to reduce labeling cost and to improve the performance.
29. Thank You
Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex, Christopher
Heid, Greg Mott, 'Implicit Entity Recognition in Clinical Documents', In proceedings of The Fourth
Joint Conference on Lexical and Computational Semantics (*SEM), 2015, PDF
http://knoesis.org/researchers/sujan/
Implicit Entity Recognition in Clinical Documents