SlideShare a Scribd company logo
Dina Demner-Fushman
CORIA-TALN
Thursday, May 17, 2018
Disclaimer
2
National Library of Medicine
Purpose and establishment
In order to assist the advancement of medical and related sciences
and to aid the dissemination and exchange of scientific and other
information important to the progress of medicine and to the public
health, there is established the National Library of Medicine
https://www.gpo.gov/fdsys/pkg/USCODE-2011-title42/html/USCODE-2011-title42-chap6A-subchapIII-partD.htm
3
3
4
o When making health-related decisions, asking questions
is a natural and preferred approach to satisfying
information needs
o Corroboration: NLM will focus on understanding how
searches are initiated, how information is used, and how
questions are posed and answered.
A Platform for Biomedical Discovery and Data-Powered Health
National Library of Medicine
Strategic Plan 2017–2027
Report of the NLM Board of Regents
5
What is the best plan of
care for this patient?
what are the causes of
abdominal pain or cramps?
What was the pre-op
echocardiogram result?
6
Patient-specific
information in clinical
narrative
Questions asked by
clinicians and patients
Answers in clinical
narrative
Questions asked by
consumers
Answers in consumer-
oriented sources
Answers in scientific
literature
7
Summarization (“bottom-
line” advice)
Linking research and clinical
information
GUI
API
Clinical question answering
Linking patient
records and
literature (CDS)
Identifying gaps in clinical research
Consumer Health Question
Answering
Repository for
Informed Decision
Making
8
o Central to clinical NLP tasks
o MetaMap Lite: A new implementation of the ontology-
based (UMLS) Named Entity Recognition tool
o "Impressive...it looks like MetaMap Lite is around 20 times faster *and* has
better performance! I understand there are caveats [in the evaluation]
(e.g., focus on disorders and only using default options for regular
MetaMap), but this is good news.”
Demner-Fushman D. Rogers WJ, Aronson SR. MetaMap Lite: an evaluation of a
new Java implementation of MetaMap. J Am Med Inform Assoc. 2017;0(0):1-5.
Collection / Tool
MetaMap cTAKES (DL) DNorm MetaMap Lite
P R F-1 P R F-1 P R F-1 P R F-1
NCBI disease 60.3 68.3 64.1 47.0 53.8 47.4 74.1 67.6 70.7 73.1 71.9 72.5
ShARe (entities) 59.5 48.1 53.2 46.3 46.2 46.2 N/A N/A N/A 74.2 42.1 53.8
i2b2 2010 38.1 35.7 36.8 31.9 34.1 32.9 N/A N/A N/A 47.0 31.9 38.0
LHC clinical 58.8 77.2 66.8 42.6 59.9 49.8 71.5 58.2 64.2 69.4 74.9 70.0
LHC biological 46.8 75.6 57.8 47.1 60.6 53.0 67.7 62.8 65.2 67.5 77.9 72.4
9
10
Roberts, K. & Demner-Fushman, D. (2016).
Annotating Logical Forms for EHR
Questions. Proceedings of the Language
Resources and Evaluation Conference
(LREC).
Roberts K, Demner-Fushman D. Toward a
Natural Language Interface for EHR
Questions. AMIA Joint Summit 2015
11
o Ask: “What was my last A1c?” OR look for it:
12
o Traditional QA systems search over unstructured data
o Not compatible with EHRs: free text + structured data
o Each EHR organizes unstructured/structured data differently
à Structured query
“What was my last A1c?”
Latest Test: A1C
13
① Can EHR questions be converted to logical forms?
② What logical operations are necessary to represent EHR
questions?
③ Can human annotators achieve sufficient agreement?
④ Will a logical form method scale to the diversity of potential
EHR questions?
14
o From Li (2012): Structured database (17 questions) and
specific note (432 questions)
o Sample 100 questions to maximize representativeness
of question categories:
o Temporal: admission, discharge, PMH, visit, status, plan,
time range
o Concept: problem, treatment, test
o Answer type: boolean, count, trend, medical unit, time,
person/org, other
15
o Take advantage of existing NLP components:
o Concept recognition/normalization
“Was she hypertensive on admission?”
Was patient UMLSFinding(C057121) on admission?
“Do I have diabetes?”
Does patient have UMLSDisease(C0011849)?
16
o First-Order Logic (FOL) + Lambda Calculus (λ)
o Atomic objects C0004057
o Boolean predicates has_treatment(x, y, z)
o Functions max(…)
o λ-expressions λx.condition(x)
o λx.has_treatment(x, C0004057, visit)
o All events of the patient taking aspirin in this hospital visit
o “What was the volume of her urine last night?”
o δ(λx.has_function(x, C0232856, visit) ^ time_within(x, “last
night”))
o “When was the patient first discharged from the ward”
o time(earliest(λx.has_event(x, discharge, visit) ^ at_location(x,
“the ward”)))
17
o Q 1-25: double-annotated w/o instruction to determine
initial set of logical elements (LE)
o Q 26-50: double-annotated: 81% agreement on LEs,
40% agreement on complete logical form
o Q 51-100: double annotated: 85% agreement on LEs,
50% agreement on complete logical form
③ Can human annotators achieve sufficient
agreement?
18
o All 100 questions could be structured as a logical form
o From 100 questions
o 113 non-CUI objects (7 unique)
o 104 CUI objects (84 unique)
o 136 predicates (21 unique)
o 226 functions (10 unique)
o 110 lambda expressions
① Can EHR questions be converted to logical forms?
② What logical operations are necessary to represent
EHR questions?
19
o Long tail in frequency distribution of LEs
o 12 unique LEs make up 86% of non-CUI LEs
o has_treatment predicate used 32 times, but
14 predicates used only once
o Future work: further annotation, integration of
semantic parser for automatic question understanding
④ Will a logical form method scale to the diversity of
potential EHR questions? (mostly)
20
Roberts, K. & Patra, B. A Semantic Parsing Method for Mapping Clinical Questions to Logical
Forms. AMIA 2017
21
Deardorff A, Masterton K, Roberts K, Kilicoglu H,
Demner-Fushman D. A protocol-driven approach to
automatically finding authoritative answers to
consumer health questions in online resources.
Journal of the Association for Information Science
and Technology. 2017 July;68(7):1724–1736
Ben Abacha A, Demner-Fushman D. Recognizing
Question Entailment for Medical Question
Answering. AMIA 2016
Demner-Fushman D, Kilicoglu H, Roberts K,
Masterton K, Deardorff A. Consumer Health Question
Answering to Automatically Support NLM Customer
Services September, 2015, Technical Report to the
LHNCBC Board of Scientific Counselors
YASSINE ME
RABET
M R A B E T Y @ M A I L . N I H .G O V
ASMA BEN ABACH A
A S M A . B E N A B A C H A @ N I H .G O V
22
o Variety of styles
o Mostly informal language
o Ungrammatical sentences
o Inconsistent capitalization & punctuation
o Abbreviations
o Misspellings
o Extraneous information interspersed among questions
o Abundance of anaphora and ellipses
o Unclear information needs
o Collection annotated with 15 question types focused on diseases and drugs:
https://ceb.nlm.nih.gov/ridem/infobot_docs/CHQA-NER-Corpus_1.0.zip
Kilicoglu H, Ben Abacha A, Mrabet Y, Shooshan SE, Rodriguez L, Masterton K, Demner-Fushman D. Semantic annotation
of consumer health questions. BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
23
Recognizing
Question
Entailment
Question Analysis
• Question Decomposition
• Focus Recognition
• Question Type Identification
FAQ Answer(s)
Question classification
• Not a question
• Answerable
• Request
• Short question
Answer Extraction
Answer Generation
Document retrieval
Query generation
Spell
check
24
o Misspellings can hinder automatic question
understanding
My mom is 82 years old suffering from anixity and
depression for the last 10 years was dianosed early on set
deminita 3 years ago. Do yall have a office in Greensboro
NC? Can you recommend someone. she has seretona
syndrome and nonething helps her.
o Error types:
o Not a real word: deminita à dementia
o Misuse of a real word: bowl movement à bowel movement
o Merge: for along time à for a long time
o Split: early on set à early onset
25
Detector Candidates Ranker Corrector
H1
H2
.
.
.
Hi
.
.
.
Hn-1
Hn
T1
T2
.
.
.
Ti
.
.
.
Tw-1
Tw
C1
C2
.
.
.
Ci
.
.
.
Cw-1
Cw
Input Layer
(Context)
w x n
Word2Vec
Input Matrix
n x w
Word2Vec
Output Matrix
Hidden Layer
(Word Embedding)
Output Layer
(Target Word)
SoftMax
P1
P2
.
.
.
Pi
.
.
.
Pw-1
Pw
Probability Score
(Target Word)
26
* An Ensemble Method for Spelling Correction in Consumer Health Questions. Kilicoglu H, Fiszman M,
Roberts K, Demner-Fushman D. AMIA 2015
Non-word:
Real-word Included:
Method Precision Recall F1 Time
Baseline * 66.91% 71.32% 0.6904 <1 hr.
CSpell 82.90% 78.29% 0.8053 < 1 min.
Method Precision Recall F1 Time
Baseline 72.01% 53.63% 0.6147 < 1 hr.
CSpell 82.80% 64.94% 0.7279 < 5 min.
Tested on 471 consumer health questions
27
Answer
Retrieval
SVM 1
BiLSTM
Frames
MetaMapLiteSVM 2
Triggers
Resolution
Question Topic (Focus)
Recognition
Question Type
Recognition
Similar Question
Retrieval
Candidate
Questions
Recognizing
Question
Entailment
Associated
Answers Answer
Generation
28
Question
o The focus is the primary entity or event of interest
o At least one per question, but occasionally multiple when
the consumer is interested in the interactions,
associations and comparisons
o UMLS entities à SVM à boundary adjustment à focus
o 56% (73% inexact) F1
o KODA à SVM
o 66% F1 (P ~ 70%, R ~ 62%)
o BiLSTM
o 59% (78% inexact) F1
Mrabet Y, Kilicoglu H, Roberts K, Demner-Fushman D.
Combining Open-domain and Biomedical Knowledge for
Topic Recognition in Consumer Health Questions. AMIA
2016 Annual Symposium, Chicago, IL, November 12-16,
2016.
Roberts K, Masterton K, Kilicoglu H, Fiszman M, Demner-
Fushman D. Annotating Question Decomposition on
Complex Medical Questions. LREC 2014.
29
Medline
Plus
GARD Long CH
Questions
F1 65% (84%) 94% (96%) 59% (78%)
30
Medline
Plus
GARD
Accuracy
(Question Type)
70.9% 85.8%
(vs 82.6% SVM)
31
Answer
Retrieval
SVM 1
BiLSTM
Frames
MetaMapLiteSVM 2
Triggers
Resolution
Question Topic (Focus)
Recognition
Question Type
Recognition
Similar Question
Retrieval
Candidate
Questions
Recognizing
Question
Entailment
Associated
Answers Answer
Generation
32
Question
How to use existing
question & answer
pairs to answer
new questions?Websites
Ben Abacha A & Demner-Fushman
D. Recognizing Question Entailment
for Medical Question Answering.
AMIA 2016
33
— Proposed definition: Question A entails Question B if
every answer to B is also an exact or partial answer to A.
A1 à B1 (An exact answer)
• A1 (CHQ): Hi I have retinitis pigmentosa for 3years. Im
suffering from this disease. Please introduce me any way to
treat mg eyes such as stem cell ....I am 25 years old and I have
only central vision. Please help me. Thank you
• B1 (FAQ): Are there treatments for RP?
A2 à B2 (A partial answer)
• A2 (CHQ): Can sepsis be prevented? Can someone get this
from a hospital?
• B2 (FAQ): Who gets sepsis?
34
q RQE Data (4k pairs of entailment questions) constructed
automatically from clinical questions (Ely & Osheroff, 2000).
Ø Data available on Github:
q Compared Machined Learning (ML) & Deep Learning (DL)
methods trained on open-domain and medical collections
of textual entailment and question similarity/entailment
(e.g. SNLI, Multi-NLI, cQA-SemEval, Quora).
q Logistic Regression trained on medical RQE data achieved
the best performance (75% Accuracy) on test data of
consumer health questions & NIH FAQs.
35
Recognizing Question
Entailment (RQE)
Similar Question
Retrieval (QR)
Question-Answer
Selection
Top-K Question Candidates
Question
Question
Index
Question-Answer
Collection
Top-N Entailed Questions
Collection of
47k QA pairs
will be available
Search Engine +
MetaMapLite
Answers
Logistic Regression
+ RQE Data
36
o Organization of a medical QA task @ TREC LiveQA 2017
o New benchmark for medical QA:
Ø Variety of consumer health questions, with reference
answers and annotations (Question Foci, Types & Keywords)
Ben Abacha A., Agichtein
E., Pinter Y. & Demner-
Fushman D. Overview of
the Medical QA Task @
TREC 2017 LiveQA Track.
Data available on Github:
37
Example: Annotated question and associated reference answers
38
Interface used
to evaluate 3k
QA pairs
39
Results on TREC’17 LiveQA medical test questions
MEASURES QR System QR+RQE
System
LiveQA’17
Best Results
LiveQA’17
Median Results
AvgScore (0-3) 0.711 0.827 0.637 0.431
Success@2+ 0.442 0.461 0.392 0.245
Precision@2+ 0.46 0.475 0.404 0.331
MAP@10 0.282 0.311 -- --
§ The best LiveQA team combined deep neural networks to retrieve similar
answered questions from the web.
Ø Relevance of this approach vs. classical QA methods.
§ Using QR+RQE and QA collection led to a 29.8% increase over the best official
score at LiveQA’17.
Ø Efficiency of recognizing question entailment and restricting answer sources
to trusted medical resources.
40
41
42
New Research Topic:
• Question: What does transverse
ct image demonstrate?
• Answer: focal defect in inflamed
appendiceal wall and
periappendiceal inflammatory
stranding.
Example:
QA
over images
43
ØOur first Deep Learning
VQA models achieved
good results: Second
best WBSS.
Ø To advance research in VQA, we built a first
manually annotated medical VQA collection
44
Dina Demner-Fushman D, Mork JG, Rogers W,
Shooshan SE, Rodriguez LM, Aronson AR.
Finding medication doses in the literature.
Submitted to AMIA 2018
Rodriguez LM, Demner-Fushman D.
Uncovering Knowledge Gaps in the Scientific
Literature on Maternal Morbidity and
Mortality using EHR Data. Submitted to AMIA
2018
45
o Doses determine medication safety and effectiveness
o Dose extraction from clinical text is extensively studied (i2b2)
o No studies on complete prescription information extraction
from the literature: Medication name, Dosage, Route of
administration, Frequency of administration, Duration of
administration, Reason for giving medication
o Questions:
o Will the approaches developed for clinical text work?
o Which sections of scientific papers provide dose information?
o Is sequence-to-sequence learning with neural networks a viable
approach to extraction of dose information?
46
o 694 documents fully annotated with drug
doses/strengths, forms, routes of administration,
frequencies and durations of administration, and the
reasons for administration
47
o MedEx
o DoseRegEx: numbers preceded or followed by units of
measure
o DoseRegEx + Chemical filter
o Long Short-Term Memory (LSTM) neural network with a
conditional random field (CRF) layer using character
embeddings.
https://guillaumegenthial.github.io/sequence-tagging-
with-tensorflow.html
Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information
extraction system for clinical narratives. J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24.
48
o Publicly available collection of scientific articles annotated with drug
doses/strengths, forms, routes of administration, frequencies and
durations of administration and the reasons for administration.
o Drop in performance when switching from clinical text.
o Dose information is predominantly reported in the full text, but
about 45% of the articles provide dose information in the titles and
abstracts as well.
49
Retrieved Terms Associated with
Pregnancy ICU Admissions
Year
2013 2014 2015 2016 2017 2018
Abdominal Compartment Syndrome 3 12 7 4 5
Duodenal Ulcer 1 1 1 1
Massive Transfusion 1 4 3 4 3
Pseudocyst of Pancreas 1
Vertebral Artery Aneurysm 1 4 2 2
Manual
Gold standard
Clinical Categories # %
Hemorrhage or Anemia 359 19.93%
Sepsis or Infection 262 14.55%
Cardiovascular Disease or
Disorder
108 6.00%
Hypertension/Pre-
eclampsia/Eclampsia
106 5.89%
Asthma, Premature Delivery,
Malignancy, Cardiomyopathy
E-Utils
E-Utils
MetaMap
50
MIMIC III
Discharge
summaries
THANK YOU!
ddemner@mail.nih.gov
51

More Related Content

Similar to NLP support for clinical tasks and decisions

2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overviewdvreeman
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2
Artificial Intelligence Institute at UofSC
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
Kees van Bochove
 
Managing Health and Disease Using Omics and Big Data
Managing Health and Disease Using Omics and Big DataManaging Health and Disease Using Omics and Big Data
Managing Health and Disease Using Omics and Big Data
Laura Berry
 
2011 11 16 - Vreeman - Corralling Creativity with Standards
2011 11 16 - Vreeman - Corralling Creativity with Standards2011 11 16 - Vreeman - Corralling Creativity with Standards
2011 11 16 - Vreeman - Corralling Creativity with Standardsdvreeman
 
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Philip Bourne
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsChimezie Ogbuji
 
Biomarkers brain regions
Biomarkers brain regionsBiomarkers brain regions
Biomarkers brain regions
Ann-Marie Roche
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
lucenerevolution
 
Analytics leads to improved quality and performance
Analytics leads to improved quality and performanceAnalytics leads to improved quality and performance
Analytics leads to improved quality and performance
Health Informatics New Zealand
 
2010 06 07 - LOINC Introduction
2010 06 07 - LOINC Introduction2010 06 07 - LOINC Introduction
2010 06 07 - LOINC Introductiondvreeman
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
Deakin University
 
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryMel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Jean-Claude Bradley
 
2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...
2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...
2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...
The Statistical and Applied Mathematical Sciences Institute
 
How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...
How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...
How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...
Crimsonpublishers-Rehabilitation
 
The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...
The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...
The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...
Richard Hartman, Ph.D.
 
Knowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text FinalKnowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text Finalkdjamies
 
nuevos criterios de sepsis
nuevos criterios de sepsisnuevos criterios de sepsis
nuevos criterios de sepsis
Veronica Dubay
 

Similar to NLP support for clinical tasks and decisions (20)

2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
 
Managing Health and Disease Using Omics and Big Data
Managing Health and Disease Using Omics and Big DataManaging Health and Disease Using Omics and Big Data
Managing Health and Disease Using Omics and Big Data
 
2011 11 16 - Vreeman - Corralling Creativity with Standards
2011 11 16 - Vreeman - Corralling Creativity with Standards2011 11 16 - Vreeman - Corralling Creativity with Standards
2011 11 16 - Vreeman - Corralling Creativity with Standards
 
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Biomarkers brain regions
Biomarkers brain regionsBiomarkers brain regions
Biomarkers brain regions
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Analytics leads to improved quality and performance
Analytics leads to improved quality and performanceAnalytics leads to improved quality and performance
Analytics leads to improved quality and performance
 
2010 06 07 - LOINC Introduction
2010 06 07 - LOINC Introduction2010 06 07 - LOINC Introduction
2010 06 07 - LOINC Introduction
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
 
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryMel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
 
2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...
2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...
2019 Triangle Machine Learning Day - Integration of Sepsis Watch, a Deep Lear...
 
How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...
How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...
How to Improve the Accuracy of the Initial Evaluation, Using a System Develop...
 
The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...
The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...
The Personalized Health Risk Profile: A New Tool for Safety and Occupational ...
 
Knowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text FinalKnowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text Final
 
nuevos criterios de sepsis
nuevos criterios de sepsisnuevos criterios de sepsis
nuevos criterios de sepsis
 

More from CORIA-TALN 2018

Slides de fin de conférence CORIA-TALN 2018
Slides de fin de conférence CORIA-TALN 2018Slides de fin de conférence CORIA-TALN 2018
Slides de fin de conférence CORIA-TALN 2018
CORIA-TALN 2018
 
Portée de la négation : détection par apprentissage supervisé en français et ...
Portée de la négation : détection par apprentissage supervisé en français et ...Portée de la négation : détection par apprentissage supervisé en français et ...
Portée de la négation : détection par apprentissage supervisé en français et ...
CORIA-TALN 2018
 
Construction d'un corpus multilingue annoté en relations de traduction
Construction d'un corpus multilingue annoté en relations de traductionConstruction d'un corpus multilingue annoté en relations de traduction
Construction d'un corpus multilingue annoté en relations de traduction
CORIA-TALN 2018
 
Analyse des noms agentifs dans des espaces vectoriels distributionnels
Analyse des noms agentifs dans des espaces vectoriels distributionnelsAnalyse des noms agentifs dans des espaces vectoriels distributionnels
Analyse des noms agentifs dans des espaces vectoriels distributionnels
CORIA-TALN 2018
 
Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...
Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...
Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...
CORIA-TALN 2018
 
Décodeur neuronal pour la transcription de documents manuscrits anciens
Décodeur neuronal pour la transcription de documents manuscrits anciensDécodeur neuronal pour la transcription de documents manuscrits anciens
Décodeur neuronal pour la transcription de documents manuscrits anciens
CORIA-TALN 2018
 
Session plénière
Session plénièreSession plénière
Session plénière
CORIA-TALN 2018
 
De l’usage réel des emojis à une prédiction de leurs catégories
De l’usage réel des emojis à une prédiction de leurs catégoriesDe l’usage réel des emojis à une prédiction de leurs catégories
De l’usage réel des emojis à une prédiction de leurs catégories
CORIA-TALN 2018
 
Welcome ! Bienvenue ! Degemer mat !
Welcome ! Bienvenue ! Degemer mat !Welcome ! Bienvenue ! Degemer mat !
Welcome ! Bienvenue ! Degemer mat !
CORIA-TALN 2018
 

More from CORIA-TALN 2018 (9)

Slides de fin de conférence CORIA-TALN 2018
Slides de fin de conférence CORIA-TALN 2018Slides de fin de conférence CORIA-TALN 2018
Slides de fin de conférence CORIA-TALN 2018
 
Portée de la négation : détection par apprentissage supervisé en français et ...
Portée de la négation : détection par apprentissage supervisé en français et ...Portée de la négation : détection par apprentissage supervisé en français et ...
Portée de la négation : détection par apprentissage supervisé en français et ...
 
Construction d'un corpus multilingue annoté en relations de traduction
Construction d'un corpus multilingue annoté en relations de traductionConstruction d'un corpus multilingue annoté en relations de traduction
Construction d'un corpus multilingue annoté en relations de traduction
 
Analyse des noms agentifs dans des espaces vectoriels distributionnels
Analyse des noms agentifs dans des espaces vectoriels distributionnelsAnalyse des noms agentifs dans des espaces vectoriels distributionnels
Analyse des noms agentifs dans des espaces vectoriels distributionnels
 
Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...
Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...
Combinaison d'informations de sous-mots et de modèles de langue pour la Reche...
 
Décodeur neuronal pour la transcription de documents manuscrits anciens
Décodeur neuronal pour la transcription de documents manuscrits anciensDécodeur neuronal pour la transcription de documents manuscrits anciens
Décodeur neuronal pour la transcription de documents manuscrits anciens
 
Session plénière
Session plénièreSession plénière
Session plénière
 
De l’usage réel des emojis à une prédiction de leurs catégories
De l’usage réel des emojis à une prédiction de leurs catégoriesDe l’usage réel des emojis à une prédiction de leurs catégories
De l’usage réel des emojis à une prédiction de leurs catégories
 
Welcome ! Bienvenue ! Degemer mat !
Welcome ! Bienvenue ! Degemer mat !Welcome ! Bienvenue ! Degemer mat !
Welcome ! Bienvenue ! Degemer mat !
 

Recently uploaded

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 

Recently uploaded (20)

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 

NLP support for clinical tasks and decisions

  • 3. National Library of Medicine Purpose and establishment In order to assist the advancement of medical and related sciences and to aid the dissemination and exchange of scientific and other information important to the progress of medicine and to the public health, there is established the National Library of Medicine https://www.gpo.gov/fdsys/pkg/USCODE-2011-title42/html/USCODE-2011-title42-chap6A-subchapIII-partD.htm 3 3
  • 4. 4
  • 5. o When making health-related decisions, asking questions is a natural and preferred approach to satisfying information needs o Corroboration: NLM will focus on understanding how searches are initiated, how information is used, and how questions are posed and answered. A Platform for Biomedical Discovery and Data-Powered Health National Library of Medicine Strategic Plan 2017–2027 Report of the NLM Board of Regents 5
  • 6. What is the best plan of care for this patient? what are the causes of abdominal pain or cramps? What was the pre-op echocardiogram result? 6
  • 7. Patient-specific information in clinical narrative Questions asked by clinicians and patients Answers in clinical narrative Questions asked by consumers Answers in consumer- oriented sources Answers in scientific literature 7
  • 8. Summarization (“bottom- line” advice) Linking research and clinical information GUI API Clinical question answering Linking patient records and literature (CDS) Identifying gaps in clinical research Consumer Health Question Answering Repository for Informed Decision Making 8
  • 9. o Central to clinical NLP tasks o MetaMap Lite: A new implementation of the ontology- based (UMLS) Named Entity Recognition tool o "Impressive...it looks like MetaMap Lite is around 20 times faster *and* has better performance! I understand there are caveats [in the evaluation] (e.g., focus on disorders and only using default options for regular MetaMap), but this is good news.” Demner-Fushman D. Rogers WJ, Aronson SR. MetaMap Lite: an evaluation of a new Java implementation of MetaMap. J Am Med Inform Assoc. 2017;0(0):1-5. Collection / Tool MetaMap cTAKES (DL) DNorm MetaMap Lite P R F-1 P R F-1 P R F-1 P R F-1 NCBI disease 60.3 68.3 64.1 47.0 53.8 47.4 74.1 67.6 70.7 73.1 71.9 72.5 ShARe (entities) 59.5 48.1 53.2 46.3 46.2 46.2 N/A N/A N/A 74.2 42.1 53.8 i2b2 2010 38.1 35.7 36.8 31.9 34.1 32.9 N/A N/A N/A 47.0 31.9 38.0 LHC clinical 58.8 77.2 66.8 42.6 59.9 49.8 71.5 58.2 64.2 69.4 74.9 70.0 LHC biological 46.8 75.6 57.8 47.1 60.6 53.0 67.7 62.8 65.2 67.5 77.9 72.4 9
  • 10. 10
  • 11. Roberts, K. & Demner-Fushman, D. (2016). Annotating Logical Forms for EHR Questions. Proceedings of the Language Resources and Evaluation Conference (LREC). Roberts K, Demner-Fushman D. Toward a Natural Language Interface for EHR Questions. AMIA Joint Summit 2015 11
  • 12. o Ask: “What was my last A1c?” OR look for it: 12
  • 13. o Traditional QA systems search over unstructured data o Not compatible with EHRs: free text + structured data o Each EHR organizes unstructured/structured data differently à Structured query “What was my last A1c?” Latest Test: A1C 13
  • 14. ① Can EHR questions be converted to logical forms? ② What logical operations are necessary to represent EHR questions? ③ Can human annotators achieve sufficient agreement? ④ Will a logical form method scale to the diversity of potential EHR questions? 14
  • 15. o From Li (2012): Structured database (17 questions) and specific note (432 questions) o Sample 100 questions to maximize representativeness of question categories: o Temporal: admission, discharge, PMH, visit, status, plan, time range o Concept: problem, treatment, test o Answer type: boolean, count, trend, medical unit, time, person/org, other 15
  • 16. o Take advantage of existing NLP components: o Concept recognition/normalization “Was she hypertensive on admission?” Was patient UMLSFinding(C057121) on admission? “Do I have diabetes?” Does patient have UMLSDisease(C0011849)? 16
  • 17. o First-Order Logic (FOL) + Lambda Calculus (λ) o Atomic objects C0004057 o Boolean predicates has_treatment(x, y, z) o Functions max(…) o λ-expressions λx.condition(x) o λx.has_treatment(x, C0004057, visit) o All events of the patient taking aspirin in this hospital visit o “What was the volume of her urine last night?” o δ(λx.has_function(x, C0232856, visit) ^ time_within(x, “last night”)) o “When was the patient first discharged from the ward” o time(earliest(λx.has_event(x, discharge, visit) ^ at_location(x, “the ward”))) 17
  • 18. o Q 1-25: double-annotated w/o instruction to determine initial set of logical elements (LE) o Q 26-50: double-annotated: 81% agreement on LEs, 40% agreement on complete logical form o Q 51-100: double annotated: 85% agreement on LEs, 50% agreement on complete logical form ③ Can human annotators achieve sufficient agreement? 18
  • 19. o All 100 questions could be structured as a logical form o From 100 questions o 113 non-CUI objects (7 unique) o 104 CUI objects (84 unique) o 136 predicates (21 unique) o 226 functions (10 unique) o 110 lambda expressions ① Can EHR questions be converted to logical forms? ② What logical operations are necessary to represent EHR questions? 19
  • 20. o Long tail in frequency distribution of LEs o 12 unique LEs make up 86% of non-CUI LEs o has_treatment predicate used 32 times, but 14 predicates used only once o Future work: further annotation, integration of semantic parser for automatic question understanding ④ Will a logical form method scale to the diversity of potential EHR questions? (mostly) 20
  • 21. Roberts, K. & Patra, B. A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms. AMIA 2017 21
  • 22. Deardorff A, Masterton K, Roberts K, Kilicoglu H, Demner-Fushman D. A protocol-driven approach to automatically finding authoritative answers to consumer health questions in online resources. Journal of the Association for Information Science and Technology. 2017 July;68(7):1724–1736 Ben Abacha A, Demner-Fushman D. Recognizing Question Entailment for Medical Question Answering. AMIA 2016 Demner-Fushman D, Kilicoglu H, Roberts K, Masterton K, Deardorff A. Consumer Health Question Answering to Automatically Support NLM Customer Services September, 2015, Technical Report to the LHNCBC Board of Scientific Counselors YASSINE ME RABET M R A B E T Y @ M A I L . N I H .G O V ASMA BEN ABACH A A S M A . B E N A B A C H A @ N I H .G O V 22
  • 23. o Variety of styles o Mostly informal language o Ungrammatical sentences o Inconsistent capitalization & punctuation o Abbreviations o Misspellings o Extraneous information interspersed among questions o Abundance of anaphora and ellipses o Unclear information needs o Collection annotated with 15 question types focused on diseases and drugs: https://ceb.nlm.nih.gov/ridem/infobot_docs/CHQA-NER-Corpus_1.0.zip Kilicoglu H, Ben Abacha A, Mrabet Y, Shooshan SE, Rodriguez L, Masterton K, Demner-Fushman D. Semantic annotation of consumer health questions. BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1. 23
  • 24. Recognizing Question Entailment Question Analysis • Question Decomposition • Focus Recognition • Question Type Identification FAQ Answer(s) Question classification • Not a question • Answerable • Request • Short question Answer Extraction Answer Generation Document retrieval Query generation Spell check 24
  • 25. o Misspellings can hinder automatic question understanding My mom is 82 years old suffering from anixity and depression for the last 10 years was dianosed early on set deminita 3 years ago. Do yall have a office in Greensboro NC? Can you recommend someone. she has seretona syndrome and nonething helps her. o Error types: o Not a real word: deminita à dementia o Misuse of a real word: bowl movement à bowel movement o Merge: for along time à for a long time o Split: early on set à early onset 25
  • 26. Detector Candidates Ranker Corrector H1 H2 . . . Hi . . . Hn-1 Hn T1 T2 . . . Ti . . . Tw-1 Tw C1 C2 . . . Ci . . . Cw-1 Cw Input Layer (Context) w x n Word2Vec Input Matrix n x w Word2Vec Output Matrix Hidden Layer (Word Embedding) Output Layer (Target Word) SoftMax P1 P2 . . . Pi . . . Pw-1 Pw Probability Score (Target Word) 26
  • 27. * An Ensemble Method for Spelling Correction in Consumer Health Questions. Kilicoglu H, Fiszman M, Roberts K, Demner-Fushman D. AMIA 2015 Non-word: Real-word Included: Method Precision Recall F1 Time Baseline * 66.91% 71.32% 0.6904 <1 hr. CSpell 82.90% 78.29% 0.8053 < 1 min. Method Precision Recall F1 Time Baseline 72.01% 53.63% 0.6147 < 1 hr. CSpell 82.80% 64.94% 0.7279 < 5 min. Tested on 471 consumer health questions 27
  • 28. Answer Retrieval SVM 1 BiLSTM Frames MetaMapLiteSVM 2 Triggers Resolution Question Topic (Focus) Recognition Question Type Recognition Similar Question Retrieval Candidate Questions Recognizing Question Entailment Associated Answers Answer Generation 28 Question
  • 29. o The focus is the primary entity or event of interest o At least one per question, but occasionally multiple when the consumer is interested in the interactions, associations and comparisons o UMLS entities à SVM à boundary adjustment à focus o 56% (73% inexact) F1 o KODA à SVM o 66% F1 (P ~ 70%, R ~ 62%) o BiLSTM o 59% (78% inexact) F1 Mrabet Y, Kilicoglu H, Roberts K, Demner-Fushman D. Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions. AMIA 2016 Annual Symposium, Chicago, IL, November 12-16, 2016. Roberts K, Masterton K, Kilicoglu H, Fiszman M, Demner- Fushman D. Annotating Question Decomposition on Complex Medical Questions. LREC 2014. 29
  • 30. Medline Plus GARD Long CH Questions F1 65% (84%) 94% (96%) 59% (78%) 30
  • 32. Answer Retrieval SVM 1 BiLSTM Frames MetaMapLiteSVM 2 Triggers Resolution Question Topic (Focus) Recognition Question Type Recognition Similar Question Retrieval Candidate Questions Recognizing Question Entailment Associated Answers Answer Generation 32 Question
  • 33. How to use existing question & answer pairs to answer new questions?Websites Ben Abacha A & Demner-Fushman D. Recognizing Question Entailment for Medical Question Answering. AMIA 2016 33
  • 34. — Proposed definition: Question A entails Question B if every answer to B is also an exact or partial answer to A. A1 à B1 (An exact answer) • A1 (CHQ): Hi I have retinitis pigmentosa for 3years. Im suffering from this disease. Please introduce me any way to treat mg eyes such as stem cell ....I am 25 years old and I have only central vision. Please help me. Thank you • B1 (FAQ): Are there treatments for RP? A2 à B2 (A partial answer) • A2 (CHQ): Can sepsis be prevented? Can someone get this from a hospital? • B2 (FAQ): Who gets sepsis? 34
  • 35. q RQE Data (4k pairs of entailment questions) constructed automatically from clinical questions (Ely & Osheroff, 2000). Ø Data available on Github: q Compared Machined Learning (ML) & Deep Learning (DL) methods trained on open-domain and medical collections of textual entailment and question similarity/entailment (e.g. SNLI, Multi-NLI, cQA-SemEval, Quora). q Logistic Regression trained on medical RQE data achieved the best performance (75% Accuracy) on test data of consumer health questions & NIH FAQs. 35
  • 36. Recognizing Question Entailment (RQE) Similar Question Retrieval (QR) Question-Answer Selection Top-K Question Candidates Question Question Index Question-Answer Collection Top-N Entailed Questions Collection of 47k QA pairs will be available Search Engine + MetaMapLite Answers Logistic Regression + RQE Data 36
  • 37. o Organization of a medical QA task @ TREC LiveQA 2017 o New benchmark for medical QA: Ø Variety of consumer health questions, with reference answers and annotations (Question Foci, Types & Keywords) Ben Abacha A., Agichtein E., Pinter Y. & Demner- Fushman D. Overview of the Medical QA Task @ TREC 2017 LiveQA Track. Data available on Github: 37
  • 38. Example: Annotated question and associated reference answers 38
  • 39. Interface used to evaluate 3k QA pairs 39
  • 40. Results on TREC’17 LiveQA medical test questions MEASURES QR System QR+RQE System LiveQA’17 Best Results LiveQA’17 Median Results AvgScore (0-3) 0.711 0.827 0.637 0.431 Success@2+ 0.442 0.461 0.392 0.245 Precision@2+ 0.46 0.475 0.404 0.331 MAP@10 0.282 0.311 -- -- § The best LiveQA team combined deep neural networks to retrieve similar answered questions from the web. Ø Relevance of this approach vs. classical QA methods. § Using QR+RQE and QA collection led to a 29.8% increase over the best official score at LiveQA’17. Ø Efficiency of recognizing question entailment and restricting answer sources to trusted medical resources. 40
  • 41. 41
  • 42. 42
  • 43. New Research Topic: • Question: What does transverse ct image demonstrate? • Answer: focal defect in inflamed appendiceal wall and periappendiceal inflammatory stranding. Example: QA over images 43
  • 44. ØOur first Deep Learning VQA models achieved good results: Second best WBSS. Ø To advance research in VQA, we built a first manually annotated medical VQA collection 44
  • 45. Dina Demner-Fushman D, Mork JG, Rogers W, Shooshan SE, Rodriguez LM, Aronson AR. Finding medication doses in the literature. Submitted to AMIA 2018 Rodriguez LM, Demner-Fushman D. Uncovering Knowledge Gaps in the Scientific Literature on Maternal Morbidity and Mortality using EHR Data. Submitted to AMIA 2018 45
  • 46. o Doses determine medication safety and effectiveness o Dose extraction from clinical text is extensively studied (i2b2) o No studies on complete prescription information extraction from the literature: Medication name, Dosage, Route of administration, Frequency of administration, Duration of administration, Reason for giving medication o Questions: o Will the approaches developed for clinical text work? o Which sections of scientific papers provide dose information? o Is sequence-to-sequence learning with neural networks a viable approach to extraction of dose information? 46
  • 47. o 694 documents fully annotated with drug doses/strengths, forms, routes of administration, frequencies and durations of administration, and the reasons for administration 47
  • 48. o MedEx o DoseRegEx: numbers preceded or followed by units of measure o DoseRegEx + Chemical filter o Long Short-Term Memory (LSTM) neural network with a conditional random field (CRF) layer using character embeddings. https://guillaumegenthial.github.io/sequence-tagging- with-tensorflow.html Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24. 48
  • 49. o Publicly available collection of scientific articles annotated with drug doses/strengths, forms, routes of administration, frequencies and durations of administration and the reasons for administration. o Drop in performance when switching from clinical text. o Dose information is predominantly reported in the full text, but about 45% of the articles provide dose information in the titles and abstracts as well. 49
  • 50. Retrieved Terms Associated with Pregnancy ICU Admissions Year 2013 2014 2015 2016 2017 2018 Abdominal Compartment Syndrome 3 12 7 4 5 Duodenal Ulcer 1 1 1 1 Massive Transfusion 1 4 3 4 3 Pseudocyst of Pancreas 1 Vertebral Artery Aneurysm 1 4 2 2 Manual Gold standard Clinical Categories # % Hemorrhage or Anemia 359 19.93% Sepsis or Infection 262 14.55% Cardiovascular Disease or Disorder 108 6.00% Hypertension/Pre- eclampsia/Eclampsia 106 5.89% Asthma, Premature Delivery, Malignancy, Cardiomyopathy E-Utils E-Utils MetaMap 50 MIMIC III Discharge summaries