Question answering is one of the most challenging tasks in the quest towards AI. Given unstructured text, we focus on the problem of NLP systems in question answering. We focus on two important real world problems in QA:
(1) Given two questions, can we identify whether they are duplicates? We use the Quora question pair data set and using tensorflow, build a model which can detect duplicate questions
(2) Given a passage, can machines read the passage and answer questions associated with that passage? This is the task of reading comprehension which humans easily do. We use Stanford QA data set and with tensorflow, build a model for the reading comprehension task.
4. Why QA is important for AI?
• QA has many applications
- Search, Dialogue, Information Extraction, Summarization and etc…
• Some popular demonstrations
- IBM Watson (Jeopardy), Siri
• Important for Artificial General Intelligence (AGI)
• QA is an AI complete problem
• Tons of interesting sub-problems and things left to do
WIKIPEDIA – “QA systems automatically answer questions posed by
humans in a natural language”
5. Two Tasks to Ponder in QA
• Duplicate Question Detection (DQD) task
- Given a huge archive of community question answering, can we use NLP
techniques to identify duplicate questions?
• Machine Reading Comprehension (MRC) Task
- Given a passage of text, answer questions related to information contained in
the text
7. Why DQD is important?
• CQA forums (Example) StackExchange, Quora
- rich repositories of crowd sourced wisdom
- expert knowledge available 24X7
• Typically users have similar informational needs
• many of the new questions arising in these forums typically have
already been asked and answered
• identifying duplicate questions is an essential for reusing
DQD – “Given a huge archive of community question answering, can
we use NLP techniques to identify duplicate questions?”
8. Why DQD is difficult?
• Two questions with identical information needs can have widely
different expressed surface forms
Q1a: does drinking coffee increase blood pressure?
Q1b: Is there a link between caffeine intake and hypertension?
• Questions with high coarse grained semantic similarity need not be
duplicates
• Q2a: ‘Can you suggest some good multi-cuisine restaurants in Paris?’
• Q2b: can you recommend good multi-cuisine restaurants in London?’
9. NLP tasks Related to DQD
• Textual Entailment - Does Text T entail Hypothesis H?
- If you help the needy, God will help you!
- Giving money to a poor man has good consequences
• Semantic Text Similarity Detection
• Paraphrase Identification - Whether two sentences convey similar
meaning
- Ganges is the longest river in India;
- Ganges river is 400 miles long, longest in India
But none of these solve the DQD problem completely!
10. Dataset for DQD
• Quora CQA forum Dataset - 400,000 potential question duplicate pairs
11. A neural approach to DQD
Sentence
representation
(LSTM)
MLP
Classifier
Sentence
representation
(LSTM)
Question 1
Question 2
Input
Output
12. Solution Outline
• Represent Q1 as a sentence
embedding using an LSTM
• Similarly represent sentence 2 as a
sentence embedding
• Feed their concatenation to a Multi-
Layer Perceptron Classifier
• Use the output of MLP classifier as
label prediction
y
Dense Layer
Representation 1 Representation 2
Representation 1 Representation 2
Embedding 1 Embedding 2
Question 1 Question 2
15. Sample code for computing loss
#compute loss
prediction = tf.nn.softmax(predict_layer_logits,name="softmax_probs")
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits,labels=labels))
#compute accuracy
correctPred = tf.equal(tf.argmax(prediction,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correctPred, tf.float32), name="accuracy")
#Invoke optimizer
optimizer = tf.train.AdamOptimizer().minimize(loss)
16. Going Beyond DQD
• Finding a pre-existing canned answer generated by human from CQA
forums, i.e. Question-Answer Pairs
• What if the answer is not present in a canned form (you don’t have
Question-Answer Pairs)
• Create a machine to answer a question:
- given a book/passage/document as context and
- you are asked to where the answer is present explicitly/implicitly in context
20. Reading Comprehension
Passage (P) + Question (Q) Answer (A)
Alyssa got to the beach after a long trip. She's from Charlotte. She traveled from
Atlanta. She’s now in Miami. She went to Miami to visit some friends. But she
wanted some time to herself at the beach, so she went there first. After going
swimming and laying out, she went to her friend Ellen's house. Ellen greeted
Alyssa and they both had some lemonade to drink. Alyssa called her friends
Kristin and Rachel to meet at Ellen's house……
Passage (P)
Question (Q) What city is Alyssa in?
Answer (A) Miami
21. Answer Extraction
Example from SQuAD dataset
The Rhine (Romansh: Rein, German: Rhein, French: le Rhin, Dutch: Rijn) is a
European river that begins in the Swiss. canton of Graubünden in the
southeastern Swiss Alps, forms part of the Swiss-Austrian, Swiss-Liechtenstein
border, Swiss-German and then the Franco-German border, then flows through
the Rhineland and eventually empties into the North Sea in the Netherlands.
The biggest city on the river Rhine is Cologne, Germany with a population of
more than 1,050,000 people. It is the second-longest river in Central and
Western Europe (after the Danube), at about 1,230 km (760 mi), with an
average discharge of about 2,900 m3/s ( 100,000 cu ft/s).
Passage (P)
Question (Q) What river is larger than the Rhine?
Answer (A) Danube
24. Answer Extraction
• Represent question text through
question embedding
• Represent the relevant passage
text thru context embedding
• Use their joint representation to
arrive at the answer
Generalization
When did James Dean die?
In 1955, actor James Dean was killed in
a two-car collision near Cholame, Calif.
Generalization
25. Encoding Text For Machine Comprehension
Use neural encoding models for estimating the probability of word type a from
document d answering query q:
• 𝑃(𝑎|𝑑, 𝑞) ∝ exp 𝑊 𝑎 𝑔 𝑑, 𝑞 𝑠. 𝑡. 𝑎 ∈ 𝑑
• where W(a) indexes row a of weight matrix W and function g(d, q) returns a
vector embedding of a document and query pair
26. Machine Reading Comprehension Techniques
• Tons of Techniques
• Attentive Reader
• Attention Sum Reader
• Gated Attention Reader
• Match LSTM with Answer Pointer
• Microsoft Resnet
• Just check out SQuAD leaderboard
• https://rajpurkar.github.io/SQuAD-explorer/
27. Memory Network
• Encode the question, encode each sentence in the passage, compute an
attention over the sentences, aggregate the result (possibly recurse), do a
softmax over answer options
• A lot of extensions (dynamic memory networks, key-value memory networks, …),
some of them pretty specific to bAbI, some of them decent
• As we will see, better performing methods skip the encoding step, reasoning
directly over words
29. Attentive Reader
• Pretty similar to memory networks
- but attention is over words, not sentences
- the modeling decisions are more sane (e.g., final softmax is with a word
embedding similarity, not a fixed output space)
• Also, only one step, not multiple steps
30. Attentive Reader
Then you do softmax similarity with
answer candidates
s(i) is attention on token i, computed
by similarity of y(i) with u