Step-by-step approach to question answering

Step-by-step approach
to question answering
Sewon Min
Seoul National University
2017.08.21
at .

Sewon Min
- Interested in Natural language understanding
with a focus on question answering
- Background
- Undergraduate in SNU (~2018)
- Research Experience in UW (2016~2017)
- Publication
- Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi, “Neural Speed Reading”.
2017. (Under review)
- Sewon Min, Minjoon Seo, Hannaneh Hajishirzi. “Question Answering through
Transfer Learning from Large Fine-grained Supervision Data”. ACL. 2017.
- Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi. “Query-reduction
Networks”. ICLR. 2017

Natural language
GAP
Current State Human-level

Contents
Current state in Question Answering
Expansion from current state
Question Answering through transfer learning
(my work)

Question Answering
SystemQuestion Answer
Context
- Structural data (DB)
- Text in Natural Language
- Dialog History
- Web data

Question Answering
SystemQuestion Answer
Context
- Structural data (DB)
- Text in Natural Language
- Dialog History
- Web data
Context-based Question Answering
Machine Comprehension

SQuAD
Southern California, often abbreviated SoCal, is a geographic
and cultural region that generally comprises California's
southernmost 10 counties. (…)
What is Southern California often abbreviated as?
Stanford Question Answering Dataset (2016)

SQuAD
Stanford Question Answering Dataset (2016)
Models
Match-LSTM (SMU), BiDAF (UW+AI2), DCN (Salesfoce),
R-Net (Microsoft), AoA Reader (HIT + iFLYTEK) and many others
Performance
- System: EM 55 F1 68 → EM 78 F1 85
- Human: EM 82 F1 91
More information: https://rajpurkar.github.io/SQuAD-explorer/

SQuAD
Why popular?
1. Domain: Context-based, Wikipedia, Real questions
2. Task: Span-based answer
- Closer to real QA than Cloze-style
- Easier to evaluate than Free-form
3. Proper difficulty

Who made airbus
Airbus SAS is an aircraft manufacturing subsidiary of EADS, a European aerospace
company. Airbus began as an union of aircraft companies …
____________ says he understands why @entity0 won’t play at his tournament
... @entity0 called me personally to let me know that he wouldn’t be playing here at
@entity23 ,” @entity3 said on his @entity21 events website...
WikiQA (Context Sentence Classification)
CNN/Daily Mail (Cloze Style)
What energy is used in photosynthesis?
Photosynthesis is a process used by plants and other organisms to convert light
energy, normally from the Sun, into chemical energy (…)
[light energy] [energy of light] [solar energy] [Light energy is used in photosynthesis]
MS Marco (Free-form)

SQuAD
Proper difficulty (Also limitation)
1. Small-scale context
2. Requiring lexical & syntactic information (paraphrase)
3. Span-based answer

SQuAD
Southern California, often abbreviated SoCal, is a geographic
and cultural region that generally comprises California's
southernmost 10 counties. (…)
What is Southern California often abbreviated as?
What does SoCal stand for?
Demo (BiDAF Model): https://allenai.github.io/bi-att-flow/demo/

BiDAF Demo
https://allenai.github.io/bi-att-flow/demo/

How to expand the task?
1. Small-scale context → Large-scale context
2. Requiring lexical information → Requiring complex reasoning
3. Span-based answer → Free form answer

Large-scale context
Longer context: WikiReading, NewsQA
Multiple context: MSMarco, TriviaQA
Open-domain: SearchQA, DrQA
Why challenging?
Cost (Time & Memory)
more information != better performance
No effective and efficient model yet!
Models with hierarchical structure

Large-scale context → More data
We have Large amount of data (such as Web data)
Approaches
1. Combination of information retrieval & question answering
2. Unsupervised learning
3. Transfer learning
Unannotated
Annotated
Annotated
Unannotated

Complex Reasoning
James the Turtle was always getting in trouble. (…) One day, James thought
he would go into town and see what kind of trouble he could get into. He
went to the grocery store and pulled all the pudding off the shelves and ate
two jars. Then he walked to the fast food restaurant and ordered 15 bags of
fries. He didn't pay, and instead headed home. (…)
Where did James go after he went to the grocery store?
A) His deck
B) His freezer
C) A fast food restaurant
D) His room
MC Test

Complex Reasoning
MCTest (7 years old)
Science Questions Dataset (Elementary school)
RACE (Middle & High school)
Very difficult, not so popular
Deep learning models have limitations

Free-form Answer
MS Marco
1. Annotation Gold Answer is difficult
What energy is used in photosynthesis?
Photosynthesis is a process used by plants and other organisms to convert light
energy, normally from the Sun, into chemical energy (…)
[light energy] [energy of light] [solar energy] [Light energy is used in photosynthesis]

Free-form Answer
- We want the answer not to be in the context.
- We prefer the full sentence to the single word.
- However, it is hard to evaluate.
- Incomplete metric. (Bag-of-word based)
What is the capital city of South Korea?
The capital city of South Korea is Seoul.
2. Evaluation is difficult

Free-form Answer
- We want the answer not to be in the context.
- We prefer the full sentence to the single word.
- However, it is hard to evaluate.
- Incomplete metric. (Bag-of-word based)
What is the capital city of South Korea?
The capital city of South Korea is Seoul.
Seoul.
The capital city of South Korea is Tokyo.
1/8
7/8
2. Evaluation is difficult

Free-form Answer
from WikiReading dataset paper (Hewlett et al.)
3. Designing generation model is difficult

Free-form Answer
WikiReading: Property instead of Question
- instance of, gender, country, date of birth, given name, …
Best model’s performance (F1)
- Given name: 88.7
- Date of opening: 30.1
Country
Folkart Towers are twin skyscrapers in the Bayrakli district of the Turkish city of Izmir.
Reaching a structural height of 200 m (656 ft) above ground level, (…)
WikiReading
3. Designing generation model is difficult

Transfer learning in QA
“Question Answering through Transfer Learning from Large fine-
grained supervision data”
Background
- transfer learning is not popular in NLP
- some previous works: transfer learning does not work when
target is different from source
Our contribution
- coarser, sentence-level QA can benefit from the transfer
learning of model trained on large, span-level QA

SICK
(RTE)
SemEval-2016
(sentence-level QA,
community QA)
WikiQA
(sentence-level QA,
Wikipedia domain)
SQuAD
(span-level QA,
Wikipedia domain)

Q Who made airbus
C1 Airbus SAS is an aircraft manufacturing subsidiary of EADS, a European aerospace company.
C2 Airbus began as an union of aircraft companies.
C3 Aerospace companies allowed the establishment of a joint-stock company, owned by EADS.
A C1(Yes), C2(No), C3(No)
Q I saw an ad, data entry jobs online. It required we give a fee and they promise fixed amount
every month. Is this a scam?
C1 well probably is so i be more careful if i were u. Why you looking for online jobs
C2 SCAM!!!!!!!!!!!!!!!!!!!!!!
C3 Bcoz i got a baby and iam nt intrested to sent him in a day care. thats y iam (...)
A C1(Good), C2(Good), C3(Bad)
WikiQA
SemEval2016-task3A

Context Query
Embedding layer
Attention layer
Modelling layer
Pooling + classification
Class
Context Query
Embedding layer
Attention layer
Modelling layer
Output layer 1 Output layer 2
Start End
BiDAF outputs start and end
position of span.
BiDAF-T outputs classification
result.
transfer

74.17
74.33
83.2
79.9
76.44
75.19
75.22
62.96
rank2
rank1
SQ* (f)
SQ (f)
SQ-T (f)
SQ
SQ-T
None
77.66
79.19
80.2
78.37
76.3
57.8
47.23
76.4
rank2
rank1
SQ* (f)
SQ (f)
SQ-T (f)
SQ
SQ-T
None
WikiQA Our results (blue) and
previous SOTA (green). We achieve
new SOTA with a large gap.
SemEval2016-task3A Our results
(blue) and previous SOTA (green).

84.57
86.2
88.22
86.63
85
83.2
84.38
82.86
81.49
77.96
Rank2
Rank1
SQuAD*
SQuAD
SQuAD-T
None
SQuAD*
SQuAD
SQuAD-T
None
SICK Our results (blue and red). We
also pretrain the model on SNLI (red).
Previous SOTA (green)

Transfer learning should work better when
the source is similar to the target. (??)
span-level
(SQuAD)
sentence-level
(WikiQA etc.)
sentence-level
(SQuAD-T)
sentence-level
(WikiQA etc.)

Top: pretrained on SQuAD-T
Bottom: pretrained on SQuAD

- We achieve SOTA on well-studied QA datasets by simple
transfer learning
- Span-level supervision leads to better learn lexical information
“Learned in Translation: Contextualized Word Vectors”
- Salesforce, 2017.08
- transfer learning from Translation to Sentiment analysis / classification / RTE /
QA
- SOTA in SST-5 & SNLI

Thank you!
Sewon Min
shmsw25@snu.ac.kr
https://shmsw25.github.io

Query-reduction network
bAbI QA dataset : require reasoning, but synthetic!

Query-reduction network
Top: Avg Error on bAbI QA
Bottom: Avg Error on bAbI dialog
from: https://seominjoon.github.io/assets/slides/1705.naver.slides.pdf

SQuAD Span-level QA 100K
SQuAD-T Sentence-level QA 100K
WikiQA Sentence-level QA 2k
SemEval-2016 Sentence-level QA 2k
SICK RTE 10k
RTE (Recognizing Textual Entailment)
- determine if the premise is entailed by/contradicts/is neutral to
the hypothesis
- a.k.a. NLI (Natural Language Inference)

Neural Speed Reading
Make neural model faster to deal with large context
Coming Soon!

Step-by-step approach to question answering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Step-by-step approach to question answering

Similar to Step-by-step approach to question answering (20)

More from NAVER Engineering

More from NAVER Engineering (20)

Recently uploaded

Recently uploaded (20)

Step-by-step approach to question answering