Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering (SciNLP 2020) | Dr. Asma Ben Abacha

Insightsfrom the Organizationof InternationalChallengesonArtificial
Intelligencein Medical QuestionAnswering
Dr. Asma Ben Abacha
AKBC 2020 SciNLP Workshop, Invited Talk, June 25, 2020
@AsmaBenAbacha

DISCLAIMER
2
 The views and opinions expressed do not necessarily state
or reflect those of the U.S. Government, and they may not
be used for advertising or product endorsement purposes.
2

Challenges on NLP & QA
1. Recognizing Question Entailment (MEDIQA 2019)
2. Medical Question Answering (TREC 2017 & MEDIQA 2019)
Challenges on NLP & Computer Vision
1. Visual Question Answering (VQA-Med 2019 & 2020)
2. Visual Question Generation (VQA-Med 2020)
3
I) Targeted Tasks and Created Datasets
II) Discussion on Evaluation Methods and Shared Tasks/Challenges
Plan

Until recent years, only one relevant medical dataset: Corpus for Evidence Based Medicine Summarization.
D. Mollá, M.E. Santiago-Martinez (2011) https://sourceforge.net/projects/ebmsumcorpus/
EVALUATION OF MEDICAL QUESTION ANSWERING (QA) SYSTEMS
4

ORGANIZED CHALLENGES & SHARED TASKS IN THE MEDICAL DOMAIN
--
• “VQA-Med: Overview of the Medical Visual Question
Answering Task at ImageCLEF 2019”. Ben Abacha et
al. CLEF 2019 & 2020.
• “Overview of the MEDIQA 2019 Shared Task on Textual
Inference, Question Entailment and Question
Answering”. Ben Abacha, Shivade & Demner-Fushman.
ACL-BioNLP 2019.
• “Overview of the Medical Question Answering Task at TREC 2017 LiveQA”.
Ben Abacha, Agichtein, Pinter, Demner-Fushman. TREC 2017
5

 Targeted Tasks &
Created Datasets

1) MEDICAL QA TRACK @ TREC LIVEQA 2017
Question 1:
• Subject: ingredients in Kapvay. Message: Is there any sufites sulfates sulfa in Kapvay? I am allergic.
Question 2:
• Subject: abetalipoproteimemia. Message: hi, I would like to know if there is any support for those
suffering with abetalipoproteinemia? I am not diagnosed but have had many test that indicate I am
suffering with this, keen to learn how to get it diagnosed and how to manage, many thanks.
7
Overview of the Medical QA Task @ TREC 2017 LiveQA Track. Asma Ben Abacha,
Eugene Agichtein, Yuval Pinter & Dina Demner-Fushman. TREC 2017.

1) Recognizing Entailment/Inference in the Medical Domain
2) Entailment/Inference for Question Answering (QA)
 2006:
 2016:
Methods for Using Textual Entailment in Open-Domain Question
Answering. Sanda Harabagiu & Andrew Hickl
Recognizing Question Entailment for Medical Question Answering.
Asma Ben Abacha & Dina Demner-Fushman
2) MEDIQA @ ACL-BioNLP 2019
Dagan
et al. (2005)
Bowman
et al. (2015)
Williams
et al. (2018)
Shivade
et al. (2015)
Ben Abacha
et al. (2015)
Adler et al.
(2012)
Romanov
& Shivade
(2018)
Thousands of papers in open domain
Ben Abacha
& Demner-
Fushman
(2016/2019)
9

THE RQE-BASED QA SYSTEM
Metrics RQE-based
QA System
LiveQA-Med
Best
LiveQA-Med
Median
Average Score 0.827 0.637 0.431
MAP@10 0.311 -- --
MRR@10 0.333 -- --
10

“Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answering”. Asma
Ben Abacha, Chaitanya Shivade & Dina Demner-Fushman. ACL-BioNLP 2019.
Three tasks: NLI, RQE & QA
72 participating teams
20 published papers
11

MEDIQA 2019:
• Confirmed the added value of using textual
inference and question entailment in QA.
• Highlighted the strength of multi-task
learning, transfer learning, and data
augmentation methods.
• Showed/documented the power of new
architectures and models such as MT-DNN
in medical QA.
12
Submission open
MEDIQA – Post Challenge Round

o Four categories of questions: Modality, Plane, Abnormality & Organ
o Training, validation, and test sets created automatically:
• Training set: 3,200 radiology images and 12,792 question-answer pairs.
3) VQA-Med @ ImageCLEF 2019
13
“VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019”. Ben
Abacha, Hasan, Datla, Liu, Demner-Fushman & Müller. CLEF 2019.

 The best team achieved 0.624 accuracy
and 0.644 BLEU score.
 Best methods: transfer learning, multi-
task learning, ensemble methods, and
hybrid approaches combining
classification models and answer
generation methods.
“VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019”. Ben
Abacha, Hasan, Datla, Liu, Demner-Fushman & Müller. CLEF 2019.
14
VQA-Med @ ImageCLEF 2019

2) VQA-MED @ IMAGECLEF 2020
- Two Tasks:
1. Visual Question Answering (VQA)
2. Visual Question Generation (VQG)
- Datasets:
• VQA training set: 4,000 images and 4,000 QA pairs.
• VQG training set: 780 images and 2,156 questions.
- 11 teams submitted runs (the highest participation in
ImageCLEF 2020).
Overview of the VQA-Med Task at ImageCLEF 2020: Visual Question Answering and Generation in the Medical Domain.
Asma Ben Abacha, Sadid A. Hasan, Vivek V. Datla, Dina Demner-Fushman & Henning Müller. CLEF 2020.
15

 State-of-the art models in NLP and Computer Vision have been applied and
new approaches focusing on abnormality questions and images.
 Exciting results knowing the potential applications of the VQA and VQG tasks
in the medical domain.
VQA-MED @ IMAGECLEF 2020

Evaluation Metrics
VQA: Manual evaluation of the
accuracy of automatic answers vs.
BLEU score
A dataset of clinically generated visual questions and answers about radiology images.
Jason J. Lau, Soumya Gayen, Asma Ben Abacha & Dina Demner-Fushman. Scientific Data,
Nature, 2018.
Closed-ended Questions:

“On the Summarization of Consumer Health Questions”.
Ben Abacha & Demner-Fushman. ACL 2019.
20
Manual vs. Automatic Evaluation of Summarization Methods
 A major issue remains
the less reliable ranking
provided by existing
evaluation measures for
text generation tasks.

• Several Benefits:
 Discovering the range of potentially hidden obstacles and putting
systems’ performance in context.
 Creating useful gold standard corpora, training data, pilot studies,
and use cases (with the collaboration of medical experts in our case).
 Building strong research communities.
• Potential future research investigations:
 Efficient/suitable evaluation methods and metrics
 Important, newly identified, subtasks that need to be resolved first
 “Better” training and testing datasets
 Need for more multi-disciplinary efforts and more support and
participation in workshops and conferences such as TREC and CLEF. 21
CHALLENGES & SHARED TASKS/EVALUATIONS

Thank you for your
Attention!
e
References
1. Asma Ben Abacha & Pierre Zweigenbaum. MEANS: a Medical Question-Answering System Combining NLP
Techniques and Semantic Web Technologies. Information Processing & Management Journal, Elsevier, 2015.
2. Asma Ben Abacha & Dina Demner-Fushman. A Question-Entailment Approach to Question Answering. BMC
Bioinformatics 2019.
3. Asma Ben Abacha, Chaitanya Shivade & Dina Demner-Fushman. Overview of the MEDIQA 2019 Shared Task on
Textual Inference, Question Entailment and Question Answering. ACL-BioNLP 2019.
4. Asma Ben Abacha & Dina Demner-Fushman. On the Summarization of Consumer Health Questions”. ACL 2019.
5. Asma Ben Abacha & Dina Demner-Fushman. On the Role of Question Summarization and Information Source
Restriction in Consumer Health Question Answering. AMIA Informatics Summit 2019.
6. Asma Ben Abacha, Eugene Agichtein, Yuval Pinter & Dina Demner-Fushman. Overview of the Medical QA Task
@ TREC 2017 LiveQA Track. TREC 2017.
7. Asma Ben Abacha & Dina Demner-Fushman. Recognizing Question Entailment for Medical Question
Answering. AMIA 2016.
8. Asma Ben Abacha, Yassine Mrabet, Mark Sharp, Travis Goodwin, Sonya E. Shooshan & Dina Demner-Fushman.
Bridging the Gap between Consumers’ Medication Questions and Trusted Answers. MEDINFO 2019.
9. Jason J. Lau, Soumya Gayen, Asma Ben Abacha & Dina Demner-Fushman. A dataset of clinically generated
visual questions and answers about radiology images. Scientific Data, Nature, 2018.
10. Asma Ben Abacha, Sadid A. Hasan, Vivek V. Datla, Joey Liu, Dina Demner-Fushman & Henning Müller. VQA-
Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019. CLEF 2019.
11. Asma Ben Abacha, Sadid A. Hasan, Vivek V. Datla, Dina Demner-Fushman & Henning Müller. Overview of the
VQA-Med Task at ImageCLEF 2020: Visual Question Answering and Generation in the Medical Domain. CLEF
2020.
12. Visual Question Generation from Radiology Images. Mourad Sarrouti, Asma Ben Abacha & Dina Demner-
Fushman. ACL-ALVR 2020.
asma.benabacha@nih.gov
asma.benabacha@gmail.com
@AsmaBenAbacha
22

Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering (SciNLP 2020) | Dr. Asma Ben Abacha

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering (SciNLP 2020) | Dr. Asma Ben Abacha

Similar to Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering (SciNLP 2020) | Dr. Asma Ben Abacha (20)

Recently uploaded

Recently uploaded (20)

Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering (SciNLP 2020) | Dr. Asma Ben Abacha