SlideShare a Scribd company logo
Perception and Intelligence Laboratory
Seoul
National
University
End-to-End Memory
Networks
Sukhbaatar, S., Weston, J., & Fergus, R. (NIPS 2015)
Junho Cho2016.08.05
Memory networks:
reasoning with long-term
memory component that can
be read and written.
2
What to solve?
Read Long Story and Answering Questions.
Problem that requires long term memory.
What about RNN?
 the memory (encoded by hidden states and weights) is
typically too small
 Not accurately remember facts from the past.
(knowledge compressed in dense vectors)
 Not even able to reproduce input as output
(Zaremba & Sutskever, 2014)
Perception and Intelligence Lab., Copyright © 2016 3
What is Memory
Network(MemNN)?
Perception and Intelligence Lab., Copyright © 2016 4
• First introduced in 2014 in ICLR2015 [Memory Networks]
• Class of models that combine large memory with learning
component that can read and write to it.
• Incorporates reasoning with attention over memory
< End-to-End Memory Network >
Neural Network with
a recurrent attention model
over an external memory.
Perception and Intelligence Lab., Copyright © 2016 5
< End-to-End Memory Network >
End-to-end Memory Network (MemN2N)
FYI, memory component
is external and not to be
changed after sentences are
embedded to memory.
Full Scheme
A, B, C, W
are to be trained
jointly
Full Scheme
1. Store story sentences
in input memory.
Full Scheme
2. Embed question
and inner product with
each memory vector.
If memory is related to
question, it will be
more attended
Full Scheme
3. Knowing which
memory to attend,
weight sum on
output memory vector.
Add embedded question,
on output and predict
answer as one-hot vector.
Full Scheme
4. Calculations in
embedding space
Full Scheme 𝑎=Softmax(W(o+u))
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
𝑢 =
𝑗
𝑩𝑞 𝑗
𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇
𝑚 𝑡)
𝑜 =
𝑗
𝑝𝑖 𝑐𝑖
𝑐𝑖 =
𝑗
𝑪𝑥𝑖𝑗
Tasks: bAbI
Perception and Intelligence Lab., Copyright © 2016 12
• Facebook AI Research
• 20 tasks for testing text understanding and reasoning
• Generated by the simulation. Humans should get 100%.
• Each task, 1000 q & 1000 ans.
• Sample task: Factoid QA with Two supporting facts
John is in the school.
Jason is in the office.
John picked up the football.
Jason went to the kitchen.
Q: Where is the football?
SUPPORTING FACT
SUPPORTING FACT
To answer the question, the two support facts are important.
Other sentences are just distraction.
Memory Network should focus on supporting fact
A: School
Tasks: bAbI
Perception and Intelligence Lab., Copyright © 2016 13
• 20 tasks
Tasks: bAbI
Perception and Intelligence Lab., Copyright © 2016 14
• 20 tasks
Tasks: bAbI
Perception and Intelligence Lab., Copyright © 2016 15
• Dynamic Memory Network (not introducing today) DEMO
• http://yerevann.com/dmn-ui/#/
• Example test story + predictions:
Antoine went to the kitchen. Antoine got the milk. Antoine travell
ed to the office. Antoine dropped the milk. Sumit picked up the fo
otball. Antoine went to the bathroom. Sumit moved to the kitchen
.
• where is the milk now? A: office
• where is the football? A: kitchen
• where is Antoine ? A: bathroom
• where is Sumit ? A: kitchen
• where was Antoine before the bathroom? A: office
Challenging
Perception and Intelligence Lab., Copyright © 2016 16
• Able to infer through Supporting FACT
• Supporting FACT is labeled at sentence.
• Also can be given for fully supervised learning
• In this paper, only give Answer as ground truth.
• Memory network will predict answer by inferring which sentence is
supporting fact.
• Useful in realistic QA tasks and language modeling
John is in the school.
Jason is in the office.
John picked up the football.
Jason went to the kitchen.
Q: Where is the football?
SUPPORTING FACT
SUPPORTING FACT
A: School
End-to-end Memory Network (MemN2N)
• Presented in NIPS2015
• New end-to-end model:
• Reads from memory with soft attention
• Performs multiple lookups (hops) on memory
• End-to-end training with back-propagation
• Only need supervision on the final output
• It is based on MemNN but that had:
• Hard attention
• requires explicit supervision of attention during training
• Only feasible for simple tasks
Problem Statement
Perception and Intelligence Lab., Copyright © 2016 18
1. bAbI: Synthetic QA tasks
2. Language model
context
TODAY
Input: Context 문장들과 질문
𝑥1 = Mary journeyed to the den.
𝑥2 = Mary went back to the kitchen.
𝑥3 = John journeyed to the bedroom.
𝑥4 = Mary discarded the milk.
Q: Where was the milk before the den?
𝑥1 𝑥2 𝑥3 𝑥4
slide from TaeHoon Kim
Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉
one hot vector, V = 177
Sentences and Question: BoW Representation
문장 하나 𝑥𝑖, 단어의 조합
A sentence ∶ 𝑥𝑖
“Mary journeyed to the den”
𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛
𝑥𝑖1 = mary
𝑥𝑖2 = journeyed
𝑥𝑖3 = to
𝑥𝑖4 = the
𝑥𝑖5 = den
slide from TaeHoon Kim
Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉
one hot vector, V = 177
Sentences and Question: BoW Representation
단어 하나: Bag-of-Words (BoW)로
표현
1
0
0
0
.
.
.
0
mary
Bag-of-Words (BoW)
𝑥𝑖1
=
𝑥𝑖1 = mary
𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛
slide from TaeHoon Kim
Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉
one hot vector, V = 177
Sentences and Question: BoW Representation
A sentence ∶ 𝑥𝑖
“Mary journeyed to the den”
𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛
Mary
journeyed
to
the
den
문장 하나: BoW의 set
1
0
0
0
.
.
.
0
mary
0
1
0
0
.
.
.
0
journeyed
0
0
1
0
.
.
.
0
to
0
0
0
1
.
.
.
0
the
𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛
𝑥𝑖1 = mary
𝑥𝑖2 = journeyed
𝑥𝑖3 = to
𝑥𝑖4 = the
𝑥𝑖5 = den
slide from TaeHoon Kim
Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉
one hot vector, V = 177
Sentences and Question: BoW Representation
Mary
journeyed
to
the
den
0
0
0
0
.
.
.
1
den
Input: BoW로 표현된 Context 문장들과
질문
𝑥1 = Mary journeyed to the den.
𝑥2 = Mary went back to the kitchen.
𝑥3 = John journeyed to the bedroom.
𝑥4 = Mary discarded the milk.
𝑥1 𝑥3 𝑥8 𝑥2 𝑥4 𝑥6 𝑥9
Q: Where was the milk before the den?
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
slide from TaeHoon Kim
Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉
one hot vector, V = 177
Sentences and Question: BoW Representation
Full Scheme 𝑎=Softmax(W(o+u))
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
𝑢 =
𝑗
𝑩𝑞 𝑗
𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇
𝑚 𝑡)
𝑜 =
𝑗
𝑝𝑖 𝑐𝑖
𝑐𝑖 =
𝑗
𝑪𝑥𝑖𝑗
Memory: 문장 하나로부터
만들어진다
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
𝑚1 = 𝑨 +𝑨 +𝑨 +𝑨
𝑥11 𝑥12 𝑥13 𝑥14
mary journeyed to the
𝑨: embedding matrix
Dimension Check
A: (d x V)
x: V = 177
m: d = 20 or 50
slide from TaeHoon Kim
0
0
0
0
.
.
.
1
+𝑨
𝑥15
den
Thus, # of memory = # of sentences
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
𝑚1 = 𝑨 +𝑨 +𝑨 +𝑨
𝑥11 𝑥12 𝑥13 𝑥14
𝑚1
𝑨: embedding matrix
Embedding to Memory
Embedding to Memory
𝑚1, 𝑚2, 𝑚3, 𝑚4
𝑥1 = Mary journeyed to the den.
𝑥2 = Mary went back to the kitchen.
𝑥3 = John journeyed to the bedroom.
𝑥4 = Mary discarded the milk.
𝑥1 = Mary journeyed to the den.
𝑥2 = Mary went back to the kitchen.
𝑥3 = John journeyed to the bedroom.
𝑥4 = Mary discarded the milk.
In bAbI task, Simpler form of memory component.
d d d
𝑚1, 𝑚2, 𝑚3, 𝑚4
𝑨
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
𝑨: embedding matrix
d
Memory: 필요한 것만 Input으로 사용
Input memory
실제로 메모리의 본체는 embedding matrix인 𝑨이며,
𝑨로 BoW Representation을 memory vector로 embedding
Training에서 𝑨를 학습.
𝑚𝑖 =
𝑗
𝑨𝑥𝑖𝑗
d d dd
# of sentences : n < 320
# of memory vectors: n
but maximum capacity
restricted to recent 50 sentences.
d…
𝑢 =
𝑗
𝑩𝑞 𝑗
Question
Q: Where was the milk before the den?
𝑢 =
𝑗
𝑩𝑞 𝑗
What memory vector to attend, based on question
𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇
𝑚 𝑡)
𝑥1 = Mary journeyed to the den.
𝑥2 = Mary went back to the kitchen.
𝑥3 = John journeyed to the bedroom.
𝑥4 = Mary discarded the milk.
Q: Where was the milk before the den?
Attention model on external memory
질문 q에 대해서 어떤 memory vector에
얼마나 집중할 것인지
𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇
𝑚 𝑡)
If 𝑢 and 𝑚𝑖 is related,
𝑚𝑖 is attended in memory,
𝑝𝑖 is higher
Attention model on external memory
𝑐𝑖 =
𝑗
𝑪𝑥𝑖𝑗
Purpose of each embedding matrices
B : Question
A : Input memory vector to attend
C : Output vector based on attention
Output
𝑜 =
𝑗
𝑝𝑖 𝑐𝑖
slide from TaeHoon Kim
Output: 요약된 정보 𝑜 + 질문 정보 𝑢
o : response vector from memory
weighted by probability vector from input
Output: 요약된 정보 𝑜 + 질문 정보 𝑢
𝑜 + 𝑢
𝑜, 𝑢 둘 다 고려해 답을 도출
slide from TaeHoon Kim
o : response vector from memory
weighted by probability vector from input
u : internal state.
Input(Question)
embedding.
Output: 실제로 정답 단어 𝑎
𝑎=Softmax(W(o+u))
slide from TaeHoon Kim
W: d x V dimensional
Model scheme
Perception and Intelligence Lab., Copyright © 2016 36
• Input sentences: 𝑥1, 𝑥2, … , 𝑥 𝑛 is taken
• Sentences are embedded into memory vectors 𝑚𝑖 and 𝑐𝑖
by using embedding matrices 𝐴 and 𝐶
• question is embedded into internal state 𝑢
• Matching is performed between 𝑢 and 𝑚𝑖 with SoftMax function
• output is calculated by the relation: 𝑜 = 𝑗 𝑝𝑖 𝑐𝑖
• another SoftMax is used to produce final prediction after summing up 𝑜 with 𝑢
Recurrent attention model
with external memory
slide from TaeHoon Kim
1-hop
uk+1
=ok
+ uk
Recurrent attention model
with external memory
slide from TaeHoon Kim
2-hops
uk+1
=ok
+ uk
Recurrent attention model
with external memory
slide from TaeHoon Kim
3-hops
uk+1
=ok
+ uk
Recurrent attention model
with external memory
uk+1
=ok
+ uk
Recurrent on memory component.
More hops, better inference on
multiple supporting facts.
Recurrent attention model
with external memory
Weight sharing. Constrained to ease training & reduce parameters
1) Adjacent: 𝐶 𝑖 = 𝐴𝑖+1, 𝑊 𝑇 = 𝐶 𝐾, 𝐵 = 𝐴1
2) RNN-like: 𝐴𝑖
= 𝐴 𝑗
, 𝐶 𝑖
= 𝐶 𝑗
uk+1
=ok
+ uk
(act like hidden state in RNN)
Memory Network &
LSTMuk+1
=ok
+ uk
(act like cell state in LSTM)
Recurrent attention model
with external memory
𝑎: School
John is in the school.
Jason is in the office.
John picked up the football.
Jason went to the kitchen.
Q: Where is the football?
A: School
Factoid QA with Two Supporting Facts
- MemNN: Fully Supervised with Support Facts
- MemN2N: Weakly supervised with only answer
- Supporting facts not used
SUPPORTING FACT
SUPPORTING FACT
NOT USED
NOT USED
Result
Perception and Intelligence Lab., Copyright © 2016 44
• Best MemN2N models close to supervised MemNN.
• All beat weakly supervised baseline model.
• Joint training is good.
• More hops, improve performance.
Result
Perception and Intelligence Lab., Copyright © 2016 45
Result
Perception and Intelligence Lab., Copyright © 2016 46
More hops are better.
Each hop gives attention to different memory unit.
Succeed to focus on correct supporting sentences.
Test Acc Failed tasks
MemNN 93.3% 4
LSTM 49% 20
MemN2N
1 hop
74.82% 17
2 hops 84.4% 11
3 hops 87.6.% 11
20 bAbI Tasks
Perception and Intelligence Lab., Copyright © 2016 47
Conclusion
Perception and Intelligence Lab., Copyright © 2016 48
• Neural network with explicit memory and recurrent attention
mechanism for reading the memory
• Trained via backpropagation and jointly on all tasks.
• Weakly supervised trainable.
• No supervision of supporting facts
• Can be used in wider range of setting
• Perform better than other same level of supervision
• In language modeling, Perform better than LSTM, RNN.
• Increasing # of hops, better result
• Still fail on some bAbI tasks.
• Not applied on large memory.
Thank you

More Related Content

What's hot

Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Alessandro Suglia
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Wuhan University
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Alessandro Suglia
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
Anuj Gupta
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
ijsc
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
Anuj Gupta
 
Phx dl meetup
Phx dl meetupPhx dl meetup
Phx dl meetup
James Sirota
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisMostapha Benhenda
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Daniele Di Mitri
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
hyunsung lee
 
AI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AIAI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AI
MLconf
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learning
aiaioo
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 

What's hot (20)

Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Phx dl meetup
Phx dl meetupPhx dl meetup
Phx dl meetup
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysis
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
AI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AIAI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AI
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Deep learning
Deep learningDeep learning
Deep learning
 
Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learning
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 

Similar to 160805 End-to-End Memory Networks

Sample modified dll with activity sheet
Sample modified dll with activity sheetSample modified dll with activity sheet
Sample modified dll with activity sheet
Gideon Pol Tiongco
 
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
NAVER D2
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly
 
Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data
Viktoria Kolomiets
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
Toru Fujino
 
A Day In The Life V4
A Day In The Life V4A Day In The Life V4
A Day In The Life V4
Darren Kuropatwa
 
Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...
Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...
Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...
Blackboard APAC
 
Tranformational Model of Translational Research that Leverages Educational Te...
Tranformational Model of Translational Research that Leverages Educational Te...Tranformational Model of Translational Research that Leverages Educational Te...
Tranformational Model of Translational Research that Leverages Educational Te...
EduSkills OECD
 
NetSci14 invited talk: Competing for attention
NetSci14 invited talk: Competing for attentionNetSci14 invited talk: Competing for attention
NetSci14 invited talk: Competing for attention
James Gleeson
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
NAVER Engineering
 
Dll math 5 q1_w2 (june 12-16, 2017)
Dll math 5 q1_w2 (june 12-16,  2017)Dll math 5 q1_w2 (june 12-16,  2017)
Dll math 5 q1_w2 (june 12-16, 2017)
Rigino Macunay Jr.
 
DLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdfDLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdf
RoyCEstenzo
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
Grokking VN
 
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community MeetingTalk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Mark Guzdial
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
Junya Tanaka
 

Similar to 160805 End-to-End Memory Networks (16)

Sample modified dll with activity sheet
Sample modified dll with activity sheetSample modified dll with activity sheet
Sample modified dll with activity sheet
 
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
A Day In The Life V4
A Day In The Life V4A Day In The Life V4
A Day In The Life V4
 
Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...
Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...
Online Tests: Filling in the Gaps | Mary-Ann Shuker & Dr Suzzanne Owen - Grif...
 
Tranformational Model of Translational Research that Leverages Educational Te...
Tranformational Model of Translational Research that Leverages Educational Te...Tranformational Model of Translational Research that Leverages Educational Te...
Tranformational Model of Translational Research that Leverages Educational Te...
 
NetSci14 invited talk: Competing for attention
NetSci14 invited talk: Competing for attentionNetSci14 invited talk: Competing for attention
NetSci14 invited talk: Competing for attention
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
Dll math 5 q1_w2 (june 12-16, 2017)
Dll math 5 q1_w2 (june 12-16,  2017)Dll math 5 q1_w2 (june 12-16,  2017)
Dll math 5 q1_w2 (june 12-16, 2017)
 
DLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdfDLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdf
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community MeetingTalk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 

More from Junho Cho

Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
Junho Cho
 
Get Used to Command Line Interface
Get Used to Command Line InterfaceGet Used to Command Line Interface
Get Used to Command Line Interface
Junho Cho
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Junho Cho
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
Junho Cho
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Junho Cho
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
Junho Cho
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
Junho Cho
 
161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs
Junho Cho
 
Unsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image GenerationUnsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image Generation
Junho Cho
 

More from Junho Cho (9)

Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Get Used to Command Line Interface
Get Used to Command Line InterfaceGet Used to Command Line Interface
Get Used to Command Line Interface
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
 
161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs
 
Unsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image GenerationUnsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image Generation
 

Recently uploaded

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 

Recently uploaded (20)

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 

160805 End-to-End Memory Networks

  • 1. Perception and Intelligence Laboratory Seoul National University End-to-End Memory Networks Sukhbaatar, S., Weston, J., & Fergus, R. (NIPS 2015) Junho Cho2016.08.05
  • 2. Memory networks: reasoning with long-term memory component that can be read and written. 2
  • 3. What to solve? Read Long Story and Answering Questions. Problem that requires long term memory. What about RNN?  the memory (encoded by hidden states and weights) is typically too small  Not accurately remember facts from the past. (knowledge compressed in dense vectors)  Not even able to reproduce input as output (Zaremba & Sutskever, 2014) Perception and Intelligence Lab., Copyright © 2016 3
  • 4. What is Memory Network(MemNN)? Perception and Intelligence Lab., Copyright © 2016 4 • First introduced in 2014 in ICLR2015 [Memory Networks] • Class of models that combine large memory with learning component that can read and write to it. • Incorporates reasoning with attention over memory < End-to-End Memory Network >
  • 5. Neural Network with a recurrent attention model over an external memory. Perception and Intelligence Lab., Copyright © 2016 5 < End-to-End Memory Network > End-to-end Memory Network (MemN2N) FYI, memory component is external and not to be changed after sentences are embedded to memory.
  • 6. Full Scheme A, B, C, W are to be trained jointly
  • 7. Full Scheme 1. Store story sentences in input memory.
  • 8. Full Scheme 2. Embed question and inner product with each memory vector. If memory is related to question, it will be more attended
  • 9. Full Scheme 3. Knowing which memory to attend, weight sum on output memory vector. Add embedded question, on output and predict answer as one-hot vector.
  • 10. Full Scheme 4. Calculations in embedding space
  • 11. Full Scheme 𝑎=Softmax(W(o+u)) 𝑚𝑖 = 𝑗 𝑨𝑥𝑖𝑗 𝑢 = 𝑗 𝑩𝑞 𝑗 𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇 𝑚 𝑡) 𝑜 = 𝑗 𝑝𝑖 𝑐𝑖 𝑐𝑖 = 𝑗 𝑪𝑥𝑖𝑗
  • 12. Tasks: bAbI Perception and Intelligence Lab., Copyright © 2016 12 • Facebook AI Research • 20 tasks for testing text understanding and reasoning • Generated by the simulation. Humans should get 100%. • Each task, 1000 q & 1000 ans. • Sample task: Factoid QA with Two supporting facts John is in the school. Jason is in the office. John picked up the football. Jason went to the kitchen. Q: Where is the football? SUPPORTING FACT SUPPORTING FACT To answer the question, the two support facts are important. Other sentences are just distraction. Memory Network should focus on supporting fact A: School
  • 13. Tasks: bAbI Perception and Intelligence Lab., Copyright © 2016 13 • 20 tasks
  • 14. Tasks: bAbI Perception and Intelligence Lab., Copyright © 2016 14 • 20 tasks
  • 15. Tasks: bAbI Perception and Intelligence Lab., Copyright © 2016 15 • Dynamic Memory Network (not introducing today) DEMO • http://yerevann.com/dmn-ui/#/ • Example test story + predictions: Antoine went to the kitchen. Antoine got the milk. Antoine travell ed to the office. Antoine dropped the milk. Sumit picked up the fo otball. Antoine went to the bathroom. Sumit moved to the kitchen . • where is the milk now? A: office • where is the football? A: kitchen • where is Antoine ? A: bathroom • where is Sumit ? A: kitchen • where was Antoine before the bathroom? A: office
  • 16. Challenging Perception and Intelligence Lab., Copyright © 2016 16 • Able to infer through Supporting FACT • Supporting FACT is labeled at sentence. • Also can be given for fully supervised learning • In this paper, only give Answer as ground truth. • Memory network will predict answer by inferring which sentence is supporting fact. • Useful in realistic QA tasks and language modeling John is in the school. Jason is in the office. John picked up the football. Jason went to the kitchen. Q: Where is the football? SUPPORTING FACT SUPPORTING FACT A: School
  • 17. End-to-end Memory Network (MemN2N) • Presented in NIPS2015 • New end-to-end model: • Reads from memory with soft attention • Performs multiple lookups (hops) on memory • End-to-end training with back-propagation • Only need supervision on the final output • It is based on MemNN but that had: • Hard attention • requires explicit supervision of attention during training • Only feasible for simple tasks
  • 18. Problem Statement Perception and Intelligence Lab., Copyright © 2016 18 1. bAbI: Synthetic QA tasks 2. Language model context TODAY
  • 19. Input: Context 문장들과 질문 𝑥1 = Mary journeyed to the den. 𝑥2 = Mary went back to the kitchen. 𝑥3 = John journeyed to the bedroom. 𝑥4 = Mary discarded the milk. Q: Where was the milk before the den? 𝑥1 𝑥2 𝑥3 𝑥4 slide from TaeHoon Kim Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉 one hot vector, V = 177 Sentences and Question: BoW Representation
  • 20. 문장 하나 𝑥𝑖, 단어의 조합 A sentence ∶ 𝑥𝑖 “Mary journeyed to the den” 𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛 𝑥𝑖1 = mary 𝑥𝑖2 = journeyed 𝑥𝑖3 = to 𝑥𝑖4 = the 𝑥𝑖5 = den slide from TaeHoon Kim Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉 one hot vector, V = 177 Sentences and Question: BoW Representation
  • 21. 단어 하나: Bag-of-Words (BoW)로 표현 1 0 0 0 . . . 0 mary Bag-of-Words (BoW) 𝑥𝑖1 = 𝑥𝑖1 = mary 𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛 slide from TaeHoon Kim Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉 one hot vector, V = 177 Sentences and Question: BoW Representation A sentence ∶ 𝑥𝑖 “Mary journeyed to the den” 𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛 Mary journeyed to the den
  • 22. 문장 하나: BoW의 set 1 0 0 0 . . . 0 mary 0 1 0 0 . . . 0 journeyed 0 0 1 0 . . . 0 to 0 0 0 1 . . . 0 the 𝑥𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛 𝑥𝑖1 = mary 𝑥𝑖2 = journeyed 𝑥𝑖3 = to 𝑥𝑖4 = the 𝑥𝑖5 = den slide from TaeHoon Kim Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉 one hot vector, V = 177 Sentences and Question: BoW Representation Mary journeyed to the den 0 0 0 0 . . . 1 den
  • 23. Input: BoW로 표현된 Context 문장들과 질문 𝑥1 = Mary journeyed to the den. 𝑥2 = Mary went back to the kitchen. 𝑥3 = John journeyed to the bedroom. 𝑥4 = Mary discarded the milk. 𝑥1 𝑥3 𝑥8 𝑥2 𝑥4 𝑥6 𝑥9 Q: Where was the milk before the den? 1 0 0 0 . . . 0 0 1 0 0 . . . 0 0 0 1 0 . . . 0 0 0 0 1 . . . 0 slide from TaeHoon Kim Word: 𝑥𝑖𝑗 ∈ ℝ 𝑉 one hot vector, V = 177 Sentences and Question: BoW Representation
  • 24. Full Scheme 𝑎=Softmax(W(o+u)) 𝑚𝑖 = 𝑗 𝑨𝑥𝑖𝑗 𝑢 = 𝑗 𝑩𝑞 𝑗 𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇 𝑚 𝑡) 𝑜 = 𝑗 𝑝𝑖 𝑐𝑖 𝑐𝑖 = 𝑗 𝑪𝑥𝑖𝑗
  • 25. Memory: 문장 하나로부터 만들어진다 𝑚𝑖 = 𝑗 𝑨𝑥𝑖𝑗 1 0 0 0 . . . 0 0 1 0 0 . . . 0 0 0 1 0 . . . 0 0 0 0 1 . . . 0 𝑚1 = 𝑨 +𝑨 +𝑨 +𝑨 𝑥11 𝑥12 𝑥13 𝑥14 mary journeyed to the 𝑨: embedding matrix Dimension Check A: (d x V) x: V = 177 m: d = 20 or 50 slide from TaeHoon Kim 0 0 0 0 . . . 1 +𝑨 𝑥15 den Thus, # of memory = # of sentences
  • 26. 𝑚𝑖 = 𝑗 𝑨𝑥𝑖𝑗 1 0 0 0 . . . 0 0 1 0 0 . . . 0 0 0 1 0 . . . 0 0 0 0 1 . . . 0 𝑚1 = 𝑨 +𝑨 +𝑨 +𝑨 𝑥11 𝑥12 𝑥13 𝑥14 𝑚1 𝑨: embedding matrix Embedding to Memory
  • 27. Embedding to Memory 𝑚1, 𝑚2, 𝑚3, 𝑚4 𝑥1 = Mary journeyed to the den. 𝑥2 = Mary went back to the kitchen. 𝑥3 = John journeyed to the bedroom. 𝑥4 = Mary discarded the milk. 𝑥1 = Mary journeyed to the den. 𝑥2 = Mary went back to the kitchen. 𝑥3 = John journeyed to the bedroom. 𝑥4 = Mary discarded the milk. In bAbI task, Simpler form of memory component. d d d 𝑚1, 𝑚2, 𝑚3, 𝑚4 𝑨 𝑚𝑖 = 𝑗 𝑨𝑥𝑖𝑗 𝑨: embedding matrix d
  • 28. Memory: 필요한 것만 Input으로 사용 Input memory 실제로 메모리의 본체는 embedding matrix인 𝑨이며, 𝑨로 BoW Representation을 memory vector로 embedding Training에서 𝑨를 학습. 𝑚𝑖 = 𝑗 𝑨𝑥𝑖𝑗 d d dd # of sentences : n < 320 # of memory vectors: n but maximum capacity restricted to recent 50 sentences. d…
  • 29. 𝑢 = 𝑗 𝑩𝑞 𝑗 Question Q: Where was the milk before the den?
  • 30. 𝑢 = 𝑗 𝑩𝑞 𝑗 What memory vector to attend, based on question 𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇 𝑚 𝑡) 𝑥1 = Mary journeyed to the den. 𝑥2 = Mary went back to the kitchen. 𝑥3 = John journeyed to the bedroom. 𝑥4 = Mary discarded the milk. Q: Where was the milk before the den? Attention model on external memory
  • 31. 질문 q에 대해서 어떤 memory vector에 얼마나 집중할 것인지 𝑝𝑖 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑢 𝑇 𝑚 𝑡) If 𝑢 and 𝑚𝑖 is related, 𝑚𝑖 is attended in memory, 𝑝𝑖 is higher Attention model on external memory
  • 32. 𝑐𝑖 = 𝑗 𝑪𝑥𝑖𝑗 Purpose of each embedding matrices B : Question A : Input memory vector to attend C : Output vector based on attention Output
  • 33. 𝑜 = 𝑗 𝑝𝑖 𝑐𝑖 slide from TaeHoon Kim Output: 요약된 정보 𝑜 + 질문 정보 𝑢 o : response vector from memory weighted by probability vector from input
  • 34. Output: 요약된 정보 𝑜 + 질문 정보 𝑢 𝑜 + 𝑢 𝑜, 𝑢 둘 다 고려해 답을 도출 slide from TaeHoon Kim o : response vector from memory weighted by probability vector from input u : internal state. Input(Question) embedding.
  • 35. Output: 실제로 정답 단어 𝑎 𝑎=Softmax(W(o+u)) slide from TaeHoon Kim W: d x V dimensional
  • 36. Model scheme Perception and Intelligence Lab., Copyright © 2016 36 • Input sentences: 𝑥1, 𝑥2, … , 𝑥 𝑛 is taken • Sentences are embedded into memory vectors 𝑚𝑖 and 𝑐𝑖 by using embedding matrices 𝐴 and 𝐶 • question is embedded into internal state 𝑢 • Matching is performed between 𝑢 and 𝑚𝑖 with SoftMax function • output is calculated by the relation: 𝑜 = 𝑗 𝑝𝑖 𝑐𝑖 • another SoftMax is used to produce final prediction after summing up 𝑜 with 𝑢
  • 37. Recurrent attention model with external memory slide from TaeHoon Kim 1-hop uk+1 =ok + uk
  • 38. Recurrent attention model with external memory slide from TaeHoon Kim 2-hops uk+1 =ok + uk
  • 39. Recurrent attention model with external memory slide from TaeHoon Kim 3-hops uk+1 =ok + uk
  • 40. Recurrent attention model with external memory uk+1 =ok + uk Recurrent on memory component. More hops, better inference on multiple supporting facts.
  • 41. Recurrent attention model with external memory Weight sharing. Constrained to ease training & reduce parameters 1) Adjacent: 𝐶 𝑖 = 𝐴𝑖+1, 𝑊 𝑇 = 𝐶 𝐾, 𝐵 = 𝐴1 2) RNN-like: 𝐴𝑖 = 𝐴 𝑗 , 𝐶 𝑖 = 𝐶 𝑗 uk+1 =ok + uk (act like hidden state in RNN)
  • 42. Memory Network & LSTMuk+1 =ok + uk (act like cell state in LSTM)
  • 43. Recurrent attention model with external memory 𝑎: School John is in the school. Jason is in the office. John picked up the football. Jason went to the kitchen. Q: Where is the football? A: School Factoid QA with Two Supporting Facts - MemNN: Fully Supervised with Support Facts - MemN2N: Weakly supervised with only answer - Supporting facts not used SUPPORTING FACT SUPPORTING FACT NOT USED NOT USED
  • 44. Result Perception and Intelligence Lab., Copyright © 2016 44 • Best MemN2N models close to supervised MemNN. • All beat weakly supervised baseline model. • Joint training is good. • More hops, improve performance.
  • 45. Result Perception and Intelligence Lab., Copyright © 2016 45
  • 46. Result Perception and Intelligence Lab., Copyright © 2016 46 More hops are better. Each hop gives attention to different memory unit. Succeed to focus on correct supporting sentences. Test Acc Failed tasks MemNN 93.3% 4 LSTM 49% 20 MemN2N 1 hop 74.82% 17 2 hops 84.4% 11 3 hops 87.6.% 11 20 bAbI Tasks
  • 47. Perception and Intelligence Lab., Copyright © 2016 47
  • 48. Conclusion Perception and Intelligence Lab., Copyright © 2016 48 • Neural network with explicit memory and recurrent attention mechanism for reading the memory • Trained via backpropagation and jointly on all tasks. • Weakly supervised trainable. • No supervision of supporting facts • Can be used in wider range of setting • Perform better than other same level of supervision • In language modeling, Perform better than LSTM, RNN. • Increasing # of hops, better result • Still fail on some bAbI tasks. • Not applied on large memory.

Editor's Notes

  1. In this work … Merge references Add memn2n