160805 End-to-End Memory Networks

Perception and Intelligence Laboratory
Seoul
National
University
End-to-End Memory
Networks
Sukhbaatar, S., Weston, J., & Fergus, R. (NIPS 2015)
Junho Cho2016.08.05

Memory networks:
reasoning with long-term
memory component that can
be read and written.
2

What to solve?
Read Long Story and Answering Questions.
Problem that requires long term memory.
What about RNN?
 the memory (encoded by hidden states and weights) is
typically too small
 Not accurately remember facts from the past.
(knowledge compressed in dense vectors)
 Not even able to reproduce input as output
(Zaremba & Sutskever, 2014)
Perception and Intelligence Lab., Copyright © 2016 3

What is Memory
Network(MemNN)?
• First introduced in 2014 in ICLR2015 [Memory Networks]
• Class of models that combine large memory with learning
component that can read and write to it.
• Incorporates reasoning with attention over memory
< End-to-End Memory Network >

Neural Network with
a recurrent attention model
over an external memory.
< End-to-End Memory Network >
End-to-end Memory Network (MemN2N)
FYI, memory component
is external and not to be
changed after sentences are
embedded to memory.

Full Scheme
A, B, C, W
are to be trained
jointly

Full Scheme
1. Store story sentences
in input memory.

Full Scheme
2. Embed question
and inner product with
each memory vector.
If memory is related to
question, it will be
more attended

Full Scheme
3. Knowing which
memory to attend,
weight sum on
output memory vector.
Add embedded question,
on output and predict
answer as one-hot vector.

Full Scheme
4. Calculations in
embedding space

Full Scheme 𝑎=Softmax(W(o+u))
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
𝑢 =
𝑗
𝑩𝑞 𝑗
𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇
𝑚 𝑡)
𝑜 =
𝑗
𝑝𝑖 𝑐𝑖
𝑐𝑖 =
𝑗
𝑪𝑥𝑖𝑗

Tasks: bAbI
• Facebook AI Research
• 20 tasks for testing text understanding and reasoning
• Generated by the simulation. Humans should get 100%.
• Each task, 1000 q & 1000 ans.
• Sample task: Factoid QA with Two supporting facts
John is in the school.
Jason is in the office.
John picked up the football.
Jason went to the kitchen.
Q: Where is the football?
SUPPORTING FACT
SUPPORTING FACT
To answer the question, the two support facts are important.
Other sentences are just distraction.
Memory Network should focus on supporting fact
A: School

Tasks: bAbI
• 20 tasks

Tasks: bAbI
• Dynamic Memory Network (not introducing today) DEMO
• http://yerevann.com/dmn-ui/#/
• Example test story + predictions:
Antoine went to the kitchen. Antoine got the milk. Antoine travell
ed to the office. Antoine dropped the milk. Sumit picked up the fo
otball. Antoine went to the bathroom. Sumit moved to the kitchen
.
• where is the milk now? A: office
• where is the football? A: kitchen
• where is Antoine ? A: bathroom
• where is Sumit ? A: kitchen
• where was Antoine before the bathroom? A: office

Challenging
• Able to infer through Supporting FACT
• Supporting FACT is labeled at sentence.
• Also can be given for fully supervised learning
• In this paper, only give Answer as ground truth.
• Memory network will predict answer by inferring which sentence is
supporting fact.
• Useful in realistic QA tasks and language modeling
SUPPORTING FACT
SUPPORTING FACT
A: School

End-to-end Memory Network (MemN2N)
• Presented in NIPS2015
• New end-to-end model:
• Reads from memory with soft attention
• Performs multiple lookups (hops) on memory
• End-to-end training with back-propagation
• Only need supervision on the final output
• It is based on MemNN but that had:
• Hard attention
• requires explicit supervision of attention during training
• Only feasible for simple tasks

Problem Statement
1. bAbI: Synthetic QA tasks
2. Language model
context
TODAY

Input: Context 문장들과 질문
𝑥1 = Mary journeyed to the den.
𝑥2 = Mary went back to the kitchen.
𝑥3 = John journeyed to the bedroom.
𝑥4 = Mary discarded the milk.
Q: Where was the milk before the den?
𝑥1 𝑥2 𝑥3 𝑥4
slide from TaeHoon Kim
Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉
one hot vector, V = 177
Sentences and Question: BoW Representation

문장 하나 𝑥𝑖, 단어의 조합
A sentence ∶ 𝑥𝑖
“Mary journeyed to the den”
𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛
𝑥𝑖1 = mary
𝑥𝑖2 = journeyed
𝑥𝑖3 = to
𝑥𝑖4 = the
𝑥𝑖5 = den

단어 하나: Bag-of-Words (BoW)로
표현
1
0
0
0
.
.
.
0
mary
Bag-of-Words (BoW)
𝑥𝑖1
=
𝑥𝑖1 = mary
A sentence ∶ 𝑥𝑖
“Mary journeyed to the den”
Mary
journeyed
to
the
den

문장 하나: BoW의 set
1
0
0
0
.
.
.
0
mary
0
1
0
0
.
.
.
0
journeyed
0
0
1
0
.
.
.
0
to
0
0
0
1
.
.
.
0
the
𝑥𝑖1 = mary
𝑥𝑖2 = journeyed
𝑥𝑖3 = to
𝑥𝑖4 = the
𝑥𝑖5 = den
Mary
journeyed
to
the
den
0
0
0
0
.
.
.
1
den

Input: BoW로 표현된 Context 문장들과
질문
𝑥1 𝑥3 𝑥8 𝑥2 𝑥4 𝑥6 𝑥9
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0

Memory: 문장 하나로부터
만들어진다
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
𝑚1 = 𝑨 +𝑨 +𝑨 +𝑨
𝑥11 𝑥12 𝑥13 𝑥14
mary journeyed to the
𝑨: embedding matrix
Dimension Check
A: (d x V)
x: V = 177
m: d = 20 or 50
0
0
0
0
.
.
.
1
+𝑨
𝑥15
den
Thus, # of memory = # of sentences

𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
𝑚1 = 𝑨 +𝑨 +𝑨 +𝑨
𝑥11 𝑥12 𝑥13 𝑥14
𝑚1
Embedding to Memory

Embedding to Memory
𝑚1, 𝑚2, 𝑚3, 𝑚4
In bAbI task, Simpler form of memory component.
d d d
𝑚1, 𝑚2, 𝑚3, 𝑚4
𝑨
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
d

Memory: 필요한 것만 Input으로 사용
Input memory
실제로 메모리의 본체는 embedding matrix인 𝑨이며,
𝑨로 BoW Representation을 memory vector로 embedding
Training에서 𝑨를 학습.
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
d d dd
# of sentences : n < 320
# of memory vectors: n
but maximum capacity
restricted to recent 50 sentences.
d…

𝑢 =
𝑗
𝑩𝑞 𝑗
Question

𝑢 =
𝑗
𝑩𝑞 𝑗
What memory vector to attend, based on question
𝑚 𝑡)
Attention model on external memory

질문 q에 대해서 어떤 memory vector에
얼마나 집중할 것인지
𝑚 𝑡)
If 𝑢 and 𝑚𝑖 is related,
𝑚𝑖 is attended in memory,
𝑝𝑖 is higher
Attention model on external memory

𝑐𝑖 =
𝑗
𝑪𝑥𝑖𝑗
Purpose of each embedding matrices
B : Question
A : Input memory vector to attend
C : Output vector based on attention
Output

𝑜 =
𝑗
𝑝𝑖 𝑐𝑖
Output: 요약된 정보 𝑜 + 질문 정보 𝑢
o : response vector from memory
weighted by probability vector from input

Output: 요약된 정보 𝑜 + 질문 정보 𝑢
𝑜 + 𝑢
𝑜, 𝑢 둘 다 고려해 답을 도출
o : response vector from memory
weighted by probability vector from input
u : internal state.
Input(Question)
embedding.

Output: 실제로 정답 단어 𝑎
𝑎=Softmax(W(o+u))
W: d x V dimensional

Model scheme
• Input sentences: 𝑥1, 𝑥2, … , 𝑥 𝑛 is taken
• Sentences are embedded into memory vectors 𝑚𝑖 and 𝑐𝑖
by using embedding matrices 𝐴 and 𝐶
• question is embedded into internal state 𝑢
• Matching is performed between 𝑢 and 𝑚𝑖 with SoftMax function
• output is calculated by the relation: 𝑜 = 𝑗 𝑝𝑖 𝑐𝑖
• another SoftMax is used to produce final prediction after summing up 𝑜 with 𝑢

Recurrent attention model
with external memory
1-hop
uk+1
=ok
+ uk

2-hops
uk+1
=ok
+ uk

3-hops
uk+1
=ok
+ uk

uk+1
=ok
+ uk
Recurrent on memory component.
More hops, better inference on
multiple supporting facts.

Weight sharing. Constrained to ease training & reduce parameters
1) Adjacent: 𝐶 𝑖 = 𝐴𝑖+1, 𝑊 𝑇 = 𝐶 𝐾, 𝐵 = 𝐴1
2) RNN-like: 𝐴𝑖
= 𝐴 𝑗
, 𝐶 𝑖
= 𝐶 𝑗
uk+1
=ok
+ uk
(act like hidden state in RNN)

Memory Network &
LSTMuk+1
=ok
+ uk
(act like cell state in LSTM)

𝑎: School
A: School
Factoid QA with Two Supporting Facts
- MemNN: Fully Supervised with Support Facts
- MemN2N: Weakly supervised with only answer
- Supporting facts not used
SUPPORTING FACT
SUPPORTING FACT
NOT USED
NOT USED

Result
• Best MemN2N models close to supervised MemNN.
• All beat weakly supervised baseline model.
• Joint training is good.
• More hops, improve performance.

Result

Result
More hops are better.
Each hop gives attention to different memory unit.
Succeed to focus on correct supporting sentences.
Test Acc Failed tasks
MemNN 93.3% 4
LSTM 49% 20
MemN2N
1 hop
74.82% 17
2 hops 84.4% 11
3 hops 87.6.% 11
20 bAbI Tasks

Conclusion
• Neural network with explicit memory and recurrent attention
mechanism for reading the memory
• Trained via backpropagation and jointly on all tasks.
• Weakly supervised trainable.
• No supervision of supporting facts
• Can be used in wider range of setting
• Perform better than other same level of supervision
• In language modeling, Perform better than LSTM, RNN.
• Increasing # of hops, better result
• Still fail on some bAbI tasks.
• Not applied on large memory.

160805 End-to-End Memory Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 160805 End-to-End Memory Networks

Similar to 160805 End-to-End Memory Networks (16)

More from Junho Cho

More from Junho Cho (9)

Recently uploaded

Recently uploaded (20)

160805 End-to-End Memory Networks

Editor's Notes