2019 acl bio_nlp_nli_surf_poster

ACL workshop on BioNLP 2019
Surf at MEDIQA 2019: Improving Performance of Natural
Language Inference in the Clinical Domain by Adopting
Pre-trained Language Model
Jiin Nam1 Seunghyun Yoon2 Kyomin Jung2
1Samsung Research, Seoul, Korea 2Seoul National University, Seoul, Korea
Summary of our work
1. MEDIQA-NLI shared task: NLI task in Bio-medical domain
MedNLI
• Comprises of tuples <P, H, Y > where: P and H are a clinical sentence pair,
(premise and hypothesis, respectively); Y indicates whether a given hypothesis can
be inferred from a given premise (“entailment”, “contradiction”, and “neutral”).
• In total of 14,049 pairs(Training: 11,232, Development: 1,395, Test: 1,422).
2. Our final result
Our best result, 90.6 in accuracy which is ranked 5th on the leaderboard of the task,
is obtained by applying list-wise approach with the best result of point-wise ap-
proaches(transfer learning on BioBERT).
3. Our main contributions
• Overcome the shortage of training data which is a common problem in the clinical
domain by adopting
– the pre-trained language models (BioBERT, PubMed-ELMo).
– transfer learning.
• Show the independent strengths of the proposed approaches in quantitative and
qualitative manners.
Point-wise approaches
1. BERT vs. Compare Aggregate
Fundamental differences between the two models
• BERT: sub-word level embeddings + Transformer model
• CompAggr: word leve embeddings + Compare&Aggregate model
𝒆𝒆𝟏𝟏
𝒑𝒑
𝒆𝒆𝒏𝒏
𝒑𝒑𝒆𝒆[𝑪𝑪𝑪𝑪𝑪𝑪]
[CLS] 𝑝𝑝1 𝑝𝑝𝑛𝑛 [SEP] ℎ1 ℎ 𝑚𝑚
𝒆𝒆𝟏𝟏
𝒉𝒉
𝒆𝒆 𝒎𝒎
𝒉𝒉𝒆𝒆[𝑺𝑺𝑺𝑺𝑺𝑺]
Tm Tm Tm Tm
…
. . .
…
𝒕𝒕𝟏𝟏
𝒑𝒑
𝒕𝒕𝒏𝒏
𝒑𝒑
C 𝒕𝒕𝟏𝟏
𝒉𝒉
𝒕𝒕 𝒎𝒎
𝒉𝒉𝒕𝒕[𝑺𝑺𝑺𝑺𝑺𝑺]… …
Class
label
𝑝𝑝1 𝑝𝑝2 𝑝𝑝𝑛𝑛
PubMed-ELMo
+
𝒆𝒆1
𝑃𝑃 𝒆𝒆2
𝑃𝑃
𝒆𝒆𝑛𝑛
𝑃𝑃
𝒂𝒂1
𝑃𝑃
𝒆𝒆1
𝐻𝐻
⊙
𝒂𝒂2
𝑃𝑃
𝒆𝒆2
𝐻𝐻
⊙
𝒂𝒂 𝑚𝑚
𝑃𝑃
𝒆𝒆 𝑚𝑚
𝐻𝐻
⊙
𝒄𝒄1
CNN
𝒆𝒆1
𝐻𝐻 𝒆𝒆2
𝐻𝐻
𝒆𝒆 𝑚𝑚
𝐻𝐻
ℎ1 ℎ2 ℎ 𝑚𝑚
𝒄𝒄2 𝒄𝒄 𝑚𝑚
…
…
…
…
PubMed-ELMo
…
…
Figure 1: Left: Overview of the BERT model. Rigth: Overview of the CompAggr model.
Examine promise and hypothesis pairs of each portion of 7%(97 examples) and
13%(188 examples) of the test set with 1)high probability and 2)“entailment” label.
• BioBERT: Top 10 pairs in probability.
• CompAggr: 6 pairs with higher than 0.80 probability.
7%
BioBERTCompAggr
66% 13%
14%
Figure 2: Venn diagram for the test results of CompareAggregate and BioBERT model
Different strengthes
• CompAggr: Capture the relationship between two sentences even though there is
no word overlap.
– No overlapping word from all 6 pairs of CompAggr
∗ (Premise) He denies any fever, diarrhea, chest pain, cough, URI symptoms, or
dysuria.
∗ (Hypothesis) The patient does not have infectious symptoms.
– Overlapping words from 7 pairs out of 10 from BioBERT
• BioBERT: Predicts labels with higher confidence
– The average probability of the correct results are 0.87 and 0.82 for BioBERT and
CompAggr.
Table 1: Left: MedNLI results of BioBERT trained on three different combinations of PMC and PubMed datasets.
Right: The model performance of four different approaches. BioBERT (transferred): the best results of transfer learning
experiments, BioBERT (expanded): the best results of MedNLI with abbreviation expansion on BioBERT
2. Transfer learning
NLI tasks in general domain(MNLI, SNLI): Conduct transfer learning on five
different combinations of MedNLI, SNLI, and MNLI.
Table 2: The results of transfer learning.
Result analysis: In overall, positive transfer occurs on MedNLI.
• The same tasks in different domains have overlapping knowledge.
• The domain specific language representations from BioBERT are maintained
while fine-tuning on general domain tasks.
• BioBERT captures different features such as medical terms and generate different
representations than what BERT does.
3. Abbreviation expansion
Replace the abbreviations with corresponding expanded forms of the public medical
abbreviations from Taber’s Online1
as below.
• (Original)Patient may required CABG.
• (Expanded)Patient may required coronary artery bypass graft.
The inconsistency of the experiment results
Table 3: Experiment results of abbreviation expansion. MedNLI (expanded): MedNLI with abbreviation expansion.
Possible interpretation: Change the conditional probability distribution P(Y|X).
• (Premise) He denied headache or nausea or vomiting.
• (Hypothesis) He is afebrile.
• (Label) “neutral” to “entailment”
List-wise approach
1. Get the results of point-wise approach (classifies each pair of data independently).
2. Re-organize the dataset into the set of a list that contains one of each class sentence
pair.
3. Classifies three sentence pair into each “entailment”, “contradiction”, and “neu-
tral” class exclusively.
1
https://www.tabers.com/tabersonline/view/Tabers-Dictionary/767492/all/Medical Abbreviations

2019 acl bio_nlp_nli_surf_poster

Recommended

Recommended

More Related Content

Similar to 2019 acl bio_nlp_nli_surf_poster

Similar to 2019 acl bio_nlp_nli_surf_poster (20)

Recently uploaded

Recently uploaded (20)

2019 acl bio_nlp_nli_surf_poster