Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs

Sandya Mannarswamy,
AI Dev Days, 9th Mar ‘18
Bangalore
Teaching
Machines
to Read
and Answer
Questions

Outline
• Question Answering & AI
• Duplicate Question Detection (DQD)
• Machine Reading Comprehension (MRC)
• Discussion

Computing Journey
Computing
Intelligence
Perceptual
Intelligence
Cognitive
Intelligence

Why QA is important for AI?
• QA has many applications
- Search, Dialogue, Information Extraction, Summarization and etc…
• Some popular demonstrations
- IBM Watson (Jeopardy), Siri
• Important for Artificial General Intelligence (AGI)
• QA is an AI complete problem
• Tons of interesting sub-problems and things left to do
WIKIPEDIA – “QA systems automatically answer questions posed by
humans in a natural language”

Two Tasks to Ponder in QA
• Duplicate Question Detection (DQD) task
- Given a huge archive of community question answering, can we use NLP
techniques to identify duplicate questions?
• Machine Reading Comprehension (MRC) Task
- Given a passage of text, answer questions related to information contained in
the text

Outline
• QA & AI
• Duplicate Question Detection (DQD)
• Machine Reading Comprehension (MRC)
• Discussion

Why DQD is important?
• CQA forums (Example) StackExchange, Quora
- rich repositories of crowd sourced wisdom
- expert knowledge available 24X7
• Typically users have similar informational needs
• many of the new questions arising in these forums typically have
already been asked and answered
• identifying duplicate questions is an essential for reusing
DQD – “Given a huge archive of community question answering, can
we use NLP techniques to identify duplicate questions?”

Why DQD is difficult?
• Two questions with identical information needs can have widely
different expressed surface forms
Q1a: does drinking coffee increase blood pressure?
Q1b: Is there a link between caffeine intake and hypertension?
• Questions with high coarse grained semantic similarity need not be
duplicates
• Q2a: ‘Can you suggest some good multi-cuisine restaurants in Paris?’
• Q2b: can you recommend good multi-cuisine restaurants in London?’

NLP tasks Related to DQD
• Textual Entailment - Does Text T entail Hypothesis H?
- If you help the needy, God will help you!
- Giving money to a poor man has good consequences
• Semantic Text Similarity Detection
• Paraphrase Identification - Whether two sentences convey similar
meaning
- Ganges is the longest river in India;
- Ganges river is 400 miles long, longest in India
But none of these solve the DQD problem completely!

Dataset for DQD
• Quora CQA forum Dataset - 400,000 potential question duplicate pairs

A neural approach to DQD
Sentence
representation
(LSTM)
MLP
Classifier
Sentence
representation
(LSTM)
Question 1
Question 2
Input
Output

Solution Outline
• Represent Q1 as a sentence
embedding using an LSTM
• Similarly represent sentence 2 as a
sentence embedding
• Feed their concatenation to a Multi-
Layer Perceptron Classifier
• Use the output of MLP classifier as
label prediction
y
Dense Layer
Representation 1 Representation 2
Representation 1 Representation 2
Embedding 1 Embedding 2
Question 1 Question 2

Sample Code for sentence representation
import tensorflow as tf
question1 = tf.placeholder(tf.float32, [N, l_q1, D], 'question1')
lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_lstm_units)
value, state = tf.nn.dynamic_rnn(lstm_cell, question1, dtype=tf.float32)
value1 = tf.transpose(value, [1, 0, 2])
lstm_output = tf.gather(value1,int(value1.get_shape()[0]) - 1)

Sample Code for classifier
predict_layer_one_out = tf.layers.dense(lstm_output,
units=256, activation=tf.nn.relu,
name="prediction_layer_one")
predict_layer_two_out = tf.layers.dense(dropout_predict_layer_one_out,
units=128, activation=tf.nn.relu,
name="prediction_layer_two")
predict_layer_logits = tf.layers.dense(predict_layer_two_out,
units=2,
name="final_output_layer")

Sample code for computing loss
#compute loss
prediction = tf.nn.softmax(predict_layer_logits,name="softmax_probs")
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits,labels=labels))
#compute accuracy
correctPred = tf.equal(tf.argmax(prediction,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correctPred, tf.float32), name="accuracy")
#Invoke optimizer
optimizer = tf.train.AdamOptimizer().minimize(loss)

Going Beyond DQD
• Finding a pre-existing canned answer generated by human from CQA
forums, i.e. Question-Answer Pairs
• What if the answer is not present in a canned form (you don’t have
Question-Answer Pairs)
• Create a machine to answer a question:
- given a book/passage/document as context and
- you are asked to where the answer is present explicitly/implicitly in context

Summerize
Answer question
Ask question
Reply and
Comment
Machine Reading Comprehension (MRC)

Reading Comprehension
WIKIPEDIA – Reading Comprehension is “the ability to
read text, process it and understand its meaning”

Reading Comprehension
Passage (P) + Question (Q)  Answer (A)
Alyssa got to the beach after a long trip. She's from Charlotte. She traveled from
Atlanta. She’s now in Miami. She went to Miami to visit some friends. But she
wanted some time to herself at the beach, so she went there first. After going
swimming and laying out, she went to her friend Ellen's house. Ellen greeted
Alyssa and they both had some lemonade to drink. Alyssa called her friends
Kristin and Rachel to meet at Ellen's house……
Passage (P)
Question (Q) What city is Alyssa in?
Answer (A) Miami

Answer Extraction
Example from SQuAD dataset
The Rhine (Romansh: Rein, German: Rhein, French: le Rhin, Dutch: Rijn) is a
European river that begins in the Swiss. canton of Graubünden in the
southeastern Swiss Alps, forms part of the Swiss-Austrian, Swiss-Liechtenstein
border, Swiss-German and then the Franco-German border, then flows through
the Rhineland and eventually empties into the North Sea in the Netherlands.
The biggest city on the river Rhine is Cologne, Germany with a population of
more than 1,050,000 people. It is the second-longest river in Central and
Western Europe (after the Danube), at about 1,230 km (760 mi), with an
average discharge of about 2,900 m3/s ( 100,000 cu ft/s).
Passage (P)
Question (Q) What river is larger than the Rhine?
Answer (A) Danube

Datasets
Source: Towards the Machine Comprehension of Text by Danqi Chen, 2017.

Answer Extraction
• Represent question text through
question embedding
• Represent the relevant passage
text thru context embedding
• Use their joint representation to
arrive at the answer
Generalization
When did James Dean die?
In 1955, actor James Dean was killed in
a two-car collision near Cholame, Calif.
Generalization

Encoding Text For Machine Comprehension
Use neural encoding models for estimating the probability of word type a from
document d answering query q:
• 𝑃(𝑎|𝑑, 𝑞) ∝ exp 𝑊 𝑎 𝑔 𝑑, 𝑞 𝑠. 𝑡. 𝑎 ∈ 𝑑
• where W(a) indexes row a of weight matrix W and function g(d, q) returns a
vector embedding of a document and query pair

Machine Reading Comprehension Techniques
• Tons of Techniques
• Attentive Reader
• Attention Sum Reader
• Gated Attention Reader
• Match LSTM with Answer Pointer
• Microsoft Resnet
• Just check out SQuAD leaderboard
• https://rajpurkar.github.io/SQuAD-explorer/

Memory Network
• Encode the question, encode each sentence in the passage, compute an
attention over the sentences, aggregate the result (possibly recurse), do a
softmax over answer options
• A lot of extensions (dynamic memory networks, key-value memory networks, …),
some of them pretty specific to bAbI, some of them decent
• As we will see, better performing methods skip the encoding step, reasoning
directly over words

Attentive Reader
• Pretty similar to memory networks
- but attention is over words, not sentences
- the modeling decisions are more sane (e.g., final softmax is with a word
embedding similarity, not a fixed output space)
• Also, only one step, not multiple steps

Attentive Reader
Then you do softmax similarity with
answer candidates
s(i) is attention on token i, computed
by similarity of y(i) with u

Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs

Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs

Similar to Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs (20)

More from CodeOps Technologies LLP

More from CodeOps Technologies LLP (20)

Recently uploaded

Recently uploaded (20)

Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs