NLP support for clinical tasks and decisions

Dina Demner-Fushman
CORIA-TALN
Thursday, May 17, 2018

National Library of Medicine
Purpose and establishment
In order to assist the advancement of medical and related sciences
and to aid the dissemination and exchange of scientific and other
information important to the progress of medicine and to the public
health, there is established the National Library of Medicine
https://www.gpo.gov/fdsys/pkg/USCODE-2011-title42/html/USCODE-2011-title42-chap6A-subchapIII-partD.htm
3
3

o When making health-related decisions, asking questions
is a natural and preferred approach to satisfying
information needs
o Corroboration: NLM will focus on understanding how
searches are initiated, how information is used, and how
questions are posed and answered.
A Platform for Biomedical Discovery and Data-Powered Health
National Library of Medicine
Strategic Plan 2017–2027
Report of the NLM Board of Regents
5

What is the best plan of
care for this patient?
what are the causes of
abdominal pain or cramps?
What was the pre-op
echocardiogram result?
6

Patient-specific
information in clinical
narrative
Questions asked by
clinicians and patients
Answers in clinical
narrative
Questions asked by
consumers
Answers in consumer-
oriented sources
Answers in scientific
literature
7

Summarization (“bottom-
line” advice)
Linking research and clinical
information
GUI
API
Clinical question answering
Linking patient
records and
literature (CDS)
Identifying gaps in clinical research
Consumer Health Question
Answering
Repository for
Informed Decision
Making
8

o Central to clinical NLP tasks
o MetaMap Lite: A new implementation of the ontology-
based (UMLS) Named Entity Recognition tool
o "Impressive...it looks like MetaMap Lite is around 20 times faster *and* has
better performance! I understand there are caveats [in the evaluation]
(e.g., focus on disorders and only using default options for regular
MetaMap), but this is good news.”
Demner-Fushman D. Rogers WJ, Aronson SR. MetaMap Lite: an evaluation of a
new Java implementation of MetaMap. J Am Med Inform Assoc. 2017;0(0):1-5.
Collection / Tool
MetaMap cTAKES (DL) DNorm MetaMap Lite
P R F-1 P R F-1 P R F-1 P R F-1
NCBI disease 60.3 68.3 64.1 47.0 53.8 47.4 74.1 67.6 70.7 73.1 71.9 72.5
ShARe (entities) 59.5 48.1 53.2 46.3 46.2 46.2 N/A N/A N/A 74.2 42.1 53.8
i2b2 2010 38.1 35.7 36.8 31.9 34.1 32.9 N/A N/A N/A 47.0 31.9 38.0
LHC clinical 58.8 77.2 66.8 42.6 59.9 49.8 71.5 58.2 64.2 69.4 74.9 70.0
LHC biological 46.8 75.6 57.8 47.1 60.6 53.0 67.7 62.8 65.2 67.5 77.9 72.4
9

Roberts, K. & Demner-Fushman, D. (2016).
Annotating Logical Forms for EHR
Questions. Proceedings of the Language
Resources and Evaluation Conference
(LREC).
Roberts K, Demner-Fushman D. Toward a
Natural Language Interface for EHR
Questions. AMIA Joint Summit 2015
11

o Ask: “What was my last A1c?” OR look for it:
12

o Traditional QA systems search over unstructured data
o Not compatible with EHRs: free text + structured data
o Each EHR organizes unstructured/structured data differently
à Structured query
“What was my last A1c?”
Latest Test: A1C
13

① Can EHR questions be converted to logical forms?
② What logical operations are necessary to represent EHR
questions?
③ Can human annotators achieve sufficient agreement?
④ Will a logical form method scale to the diversity of potential
EHR questions?
14

o From Li (2012): Structured database (17 questions) and
specific note (432 questions)
o Sample 100 questions to maximize representativeness
of question categories:
o Temporal: admission, discharge, PMH, visit, status, plan,
time range
o Concept: problem, treatment, test
o Answer type: boolean, count, trend, medical unit, time,
person/org, other
15

o Take advantage of existing NLP components:
o Concept recognition/normalization
“Was she hypertensive on admission?”
Was patient UMLSFinding(C057121) on admission?
“Do I have diabetes?”
Does patient have UMLSDisease(C0011849)?
16

o First-Order Logic (FOL) + Lambda Calculus (λ)
o Atomic objects C0004057
o Boolean predicates has_treatment(x, y, z)
o Functions max(…)
o λ-expressions λx.condition(x)
o λx.has_treatment(x, C0004057, visit)
o All events of the patient taking aspirin in this hospital visit
o “What was the volume of her urine last night?”
o δ(λx.has_function(x, C0232856, visit) ^ time_within(x, “last
night”))
o “When was the patient first discharged from the ward”
o time(earliest(λx.has_event(x, discharge, visit) ^ at_location(x,
“the ward”)))
17

o Q 1-25: double-annotated w/o instruction to determine
initial set of logical elements (LE)
o Q 26-50: double-annotated: 81% agreement on LEs,
40% agreement on complete logical form
o Q 51-100: double annotated: 85% agreement on LEs,
50% agreement on complete logical form
③ Can human annotators achieve sufficient
agreement?
18

o All 100 questions could be structured as a logical form
o From 100 questions
o 113 non-CUI objects (7 unique)
o 104 CUI objects (84 unique)
o 136 predicates (21 unique)
o 226 functions (10 unique)
o 110 lambda expressions
① Can EHR questions be converted to logical forms?
② What logical operations are necessary to represent
EHR questions?
19

o Long tail in frequency distribution of LEs
o 12 unique LEs make up 86% of non-CUI LEs
o has_treatment predicate used 32 times, but
14 predicates used only once
o Future work: further annotation, integration of
semantic parser for automatic question understanding
④ Will a logical form method scale to the diversity of
potential EHR questions? (mostly)
20

Roberts, K. & Patra, B. A Semantic Parsing Method for Mapping Clinical Questions to Logical
Forms. AMIA 2017
21

Deardorff A, Masterton K, Roberts K, Kilicoglu H,
Demner-Fushman D. A protocol-driven approach to
automatically finding authoritative answers to
consumer health questions in online resources.
Journal of the Association for Information Science
and Technology. 2017 July;68(7):1724–1736
Ben Abacha A, Demner-Fushman D. Recognizing
Question Entailment for Medical Question
Answering. AMIA 2016
Demner-Fushman D, Kilicoglu H, Roberts K,
Masterton K, Deardorff A. Consumer Health Question
Answering to Automatically Support NLM Customer
Services September, 2015, Technical Report to the
LHNCBC Board of Scientific Counselors
YASSINE ME
RABET
M R A B E T Y @ M A I L . N I H .G O V
ASMA BEN ABACH A
A S M A . B E N A B A C H A @ N I H .G O V
22

o Variety of styles
o Mostly informal language
o Ungrammatical sentences
o Inconsistent capitalization & punctuation
o Abbreviations
o Misspellings
o Extraneous information interspersed among questions
o Abundance of anaphora and ellipses
o Unclear information needs
o Collection annotated with 15 question types focused on diseases and drugs:
https://ceb.nlm.nih.gov/ridem/infobot_docs/CHQA-NER-Corpus_1.0.zip
Kilicoglu H, Ben Abacha A, Mrabet Y, Shooshan SE, Rodriguez L, Masterton K, Demner-Fushman D. Semantic annotation
of consumer health questions. BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
23

Recognizing
Question
Entailment
Question Analysis
• Question Decomposition
• Focus Recognition
• Question Type Identification
FAQ Answer(s)
Question classification
• Not a question
• Answerable
• Request
• Short question
Answer Extraction
Answer Generation
Document retrieval
Query generation
Spell
check
24

o Misspellings can hinder automatic question
understanding
My mom is 82 years old suffering from anixity and
depression for the last 10 years was dianosed early on set
deminita 3 years ago. Do yall have a office in Greensboro
NC? Can you recommend someone. she has seretona
syndrome and nonething helps her.
o Error types:
o Not a real word: deminita à dementia
o Misuse of a real word: bowl movement à bowel movement
o Merge: for along time à for a long time
o Split: early on set à early onset
25

Detector Candidates Ranker Corrector
H1
H2
.
.
.
Hi
.
.
.
Hn-1
Hn
T1
T2
.
.
.
Ti
.
.
.
Tw-1
Tw
C1
C2
.
.
.
Ci
.
.
.
Cw-1
Cw
Input Layer
(Context)
w x n
Word2Vec
Input Matrix
n x w
Word2Vec
Output Matrix
Hidden Layer
(Word Embedding)
Output Layer
(Target Word)
SoftMax
P1
P2
.
.
.
Pi
.
.
.
Pw-1
Pw
Probability Score
(Target Word)
26

* An Ensemble Method for Spelling Correction in Consumer Health Questions. Kilicoglu H, Fiszman M,
Roberts K, Demner-Fushman D. AMIA 2015
Non-word:
Real-word Included:
Method Precision Recall F1 Time
Baseline * 66.91% 71.32% 0.6904 <1 hr.
CSpell 82.90% 78.29% 0.8053 < 1 min.
Method Precision Recall F1 Time
Baseline 72.01% 53.63% 0.6147 < 1 hr.
CSpell 82.80% 64.94% 0.7279 < 5 min.
Tested on 471 consumer health questions
27

Answer
Retrieval
SVM 1
BiLSTM
Frames
MetaMapLiteSVM 2
Triggers
Resolution
Question Topic (Focus)
Recognition
Question Type
Recognition
Similar Question
Retrieval
Candidate
Questions
Recognizing
Question
Entailment
Associated
Answers Answer
Generation
28
Question

o The focus is the primary entity or event of interest
o At least one per question, but occasionally multiple when
the consumer is interested in the interactions,
associations and comparisons
o UMLS entities à SVM à boundary adjustment à focus
o 56% (73% inexact) F1
o KODA à SVM
o 66% F1 (P ~ 70%, R ~ 62%)
o BiLSTM
o 59% (78% inexact) F1
Mrabet Y, Kilicoglu H, Roberts K, Demner-Fushman D.
Combining Open-domain and Biomedical Knowledge for
Topic Recognition in Consumer Health Questions. AMIA
2016 Annual Symposium, Chicago, IL, November 12-16,
2016.
Roberts K, Masterton K, Kilicoglu H, Fiszman M, Demner-
Fushman D. Annotating Question Decomposition on
Complex Medical Questions. LREC 2014.
29

Medline
Plus
GARD Long CH
Questions
F1 65% (84%) 94% (96%) 59% (78%)
30

Medline
Plus
GARD
Accuracy
(Question Type)
70.9% 85.8%
(vs 82.6% SVM)
31

Answer
Retrieval
SVM 1
BiLSTM
Frames
MetaMapLiteSVM 2
Triggers
Resolution
Question Topic (Focus)
Recognition
Question Type
Recognition
Similar Question
Retrieval
Candidate
Questions
Recognizing
Question
Entailment
Associated
Answers Answer
Generation
32
Question

How to use existing
question & answer
pairs to answer
new questions?Websites
Ben Abacha A & Demner-Fushman
D. Recognizing Question Entailment
for Medical Question Answering.
AMIA 2016
33

— Proposed definition: Question A entails Question B if
every answer to B is also an exact or partial answer to A.
A1 à B1 (An exact answer)
• A1 (CHQ): Hi I have retinitis pigmentosa for 3years. Im
suffering from this disease. Please introduce me any way to
treat mg eyes such as stem cell ....I am 25 years old and I have
only central vision. Please help me. Thank you
• B1 (FAQ): Are there treatments for RP?
A2 à B2 (A partial answer)
• A2 (CHQ): Can sepsis be prevented? Can someone get this
from a hospital?
• B2 (FAQ): Who gets sepsis?
34

q RQE Data (4k pairs of entailment questions) constructed
automatically from clinical questions (Ely & Osheroff, 2000).
Ø Data available on Github:
q Compared Machined Learning (ML) & Deep Learning (DL)
methods trained on open-domain and medical collections
of textual entailment and question similarity/entailment
(e.g. SNLI, Multi-NLI, cQA-SemEval, Quora).
q Logistic Regression trained on medical RQE data achieved
the best performance (75% Accuracy) on test data of
consumer health questions & NIH FAQs.
35

Recognizing Question
Entailment (RQE)
Similar Question
Retrieval (QR)
Question-Answer
Selection
Top-K Question Candidates
Question
Question
Index
Question-Answer
Collection
Top-N Entailed Questions
Collection of
47k QA pairs
will be available
Search Engine +
MetaMapLite
Answers
Logistic Regression
+ RQE Data
36

o Organization of a medical QA task @ TREC LiveQA 2017
o New benchmark for medical QA:
Ø Variety of consumer health questions, with reference
answers and annotations (Question Foci, Types & Keywords)
Ben Abacha A., Agichtein
E., Pinter Y. & Demner-
Fushman D. Overview of
the Medical QA Task @
TREC 2017 LiveQA Track.
Data available on Github:
37

Example: Annotated question and associated reference answers
38

Interface used
to evaluate 3k
QA pairs
39

Results on TREC’17 LiveQA medical test questions
MEASURES QR System QR+RQE
System
LiveQA’17
Best Results
LiveQA’17
Median Results
AvgScore (0-3) 0.711 0.827 0.637 0.431
Success@2+ 0.442 0.461 0.392 0.245
Precision@2+ 0.46 0.475 0.404 0.331
MAP@10 0.282 0.311 -- --
§ The best LiveQA team combined deep neural networks to retrieve similar
answered questions from the web.
Ø Relevance of this approach vs. classical QA methods.
§ Using QR+RQE and QA collection led to a 29.8% increase over the best official
score at LiveQA’17.
Ø Efficiency of recognizing question entailment and restricting answer sources
to trusted medical resources.
40

New Research Topic:
• Question: What does transverse
ct image demonstrate?
• Answer: focal defect in inflamed
appendiceal wall and
periappendiceal inflammatory
stranding.
Example:
QA
over images
43

ØOur first Deep Learning
VQA models achieved
good results: Second
best WBSS.
Ø To advance research in VQA, we built a first
manually annotated medical VQA collection
44

Dina Demner-Fushman D, Mork JG, Rogers W,
Shooshan SE, Rodriguez LM, Aronson AR.
Finding medication doses in the literature.
Submitted to AMIA 2018
Rodriguez LM, Demner-Fushman D.
Uncovering Knowledge Gaps in the Scientific
Literature on Maternal Morbidity and
Mortality using EHR Data. Submitted to AMIA
2018
45

o Doses determine medication safety and effectiveness
o Dose extraction from clinical text is extensively studied (i2b2)
o No studies on complete prescription information extraction
from the literature: Medication name, Dosage, Route of
administration, Frequency of administration, Duration of
administration, Reason for giving medication
o Questions:
o Will the approaches developed for clinical text work?
o Which sections of scientific papers provide dose information?
o Is sequence-to-sequence learning with neural networks a viable
approach to extraction of dose information?
46

o 694 documents fully annotated with drug
doses/strengths, forms, routes of administration,
frequencies and durations of administration, and the
reasons for administration
47

o MedEx
o DoseRegEx: numbers preceded or followed by units of
measure
o DoseRegEx + Chemical filter
o Long Short-Term Memory (LSTM) neural network with a
conditional random field (CRF) layer using character
embeddings.
https://guillaumegenthial.github.io/sequence-tagging-
with-tensorflow.html
Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information
extraction system for clinical narratives. J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24.
48

o Publicly available collection of scientific articles annotated with drug
doses/strengths, forms, routes of administration, frequencies and
durations of administration and the reasons for administration.
o Drop in performance when switching from clinical text.
o Dose information is predominantly reported in the full text, but
about 45% of the articles provide dose information in the titles and
abstracts as well.
49

Retrieved Terms Associated with
Pregnancy ICU Admissions
Year
2013 2014 2015 2016 2017 2018
Abdominal Compartment Syndrome 3 12 7 4 5
Duodenal Ulcer 1 1 1 1
Massive Transfusion 1 4 3 4 3
Pseudocyst of Pancreas 1
Vertebral Artery Aneurysm 1 4 2 2
Manual
Gold standard
Clinical Categories # %
Hemorrhage or Anemia 359 19.93%
Sepsis or Infection 262 14.55%
Cardiovascular Disease or
Disorder
108 6.00%
Hypertension/Pre-
eclampsia/Eclampsia
106 5.89%
Asthma, Premature Delivery,
Malignancy, Cardiomyopathy
E-Utils
E-Utils
MetaMap
50
MIMIC III
Discharge
summaries

THANK YOU!
ddemner@mail.nih.gov
51

NLP support for clinical tasks and decisions

Recommended

Recommended

More Related Content

Similar to NLP support for clinical tasks and decisions

Similar to NLP support for clinical tasks and decisions (20)

More from CORIA-TALN 2018

More from CORIA-TALN 2018 (9)

Recently uploaded

Recently uploaded (20)

NLP support for clinical tasks and decisions