SlideShare a Scribd company logo
Episodic Memory Reader:
Learning What to Remember
for Question Answering from Streaming Data
Moonsu Han1*, Minki Kang1*, Hyunwoo Jung1,3 and Sung Ju Hwang1,2
KAIST1, Daejeon, South Korea
AITRICS2, Seoul, South Korea
NAVER Clova3, Seongnam, South Korea
1
Motivation
2
When interacting with users, an agent should remember the information of the user.
The agent does not know when and what information and questions will be given.
User
Agent
It was a hard day. I do not like noisy places.
How about a rest at home?
I sometimes read books when I have a break.
……
Today is a holiday.
How about taking a break and
reading a book at home?
(Clova WAVE)
John
I like potatoes.
What is my name?
NEUT: John
Memory Scalability Problem
3
The best way to preserve the information is to store all of it in external memory.
However, it cannot retain and learn all information due to limitations on memory
capacity.
User
POS: Library
NEG: Noisy
POS: Baseball
NEUT: Senior
POS: Potato
…
Read
External Memory
NEUT: John
Agent
(Clova WAVE)
Write
Learning What to Remember from Streaming Data
4
We substitute this problem for a novel question answering (QA) task, where the
machine does not know when the questions will be given from streaming data.
Inspired by this motivation, we define a new problem that can arise in the real-world
and that should be addressed in the future to build a conversation agent.
Data 1
Data 2
External Memory
QA model
Supporting fact 1
Data 1
Streaming Data
Query 1
…
Supporting fact T
Query T
…
Data 1
Data 2Data 1
Data 2
…
Supporting fact T
Supporting fact 3
Data 95
Data 150
Data 154
Supporting fact TSupporting fact T
Data 2
WriteRead
Answer T
Supporting fact T
Learning What to Remember from Streaming Data
5
The model can answer unknown queries by storing incoming data until the external
memory is full.
Write
External Memory
QA model
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
Data 3
Data 1
Data 2
Data 4
Supporting fact 1
…
Data 1
Data 1
Data 1
Data 2Read Data 3
Data 2
Data 2
Supporting fact 1Data 4
Data 3
Data 4
Supporting fact 1
Supporting fact 1
Query 1
Answer 1
Supporting fact 1
Learning What to Remember from Streaming Data
6
When the memory is full, the model should determine which memory entry or
incoming data to discard.
External Memory
Supporting fact T
QA model
Supporting fact 1
Data 1
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
…
In this situation, it can easily decide to replace the entry since almost all memory
entries are useless.
Write
Data 2
Data 3
Data 4
Supporting fact T
Supporting fact T
Supporting fact T
Supporting fact T
Learning What to Remember from Streaming Data
7
In the case of when the memory is full of supporting facts, it is difficult to decide which
memory entry to delete.
External Memory
Supporting fact T
QA model
Supporting fact 1
Supporting fact 2
Supporting fact 7
Supporting fact 8
Supporting fact 9
Which
memory
entry is no
longer
needed?
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
…
Therefore, there is a need for a model that learns general importance of an instance
of data and which data is important at what time.
Problem Definition
8
Given a data stream 𝑋 = 𝑥 1 , … , 𝑥 𝑇 , a model should learn a function 𝑭: 𝑿 → 𝑴
that maps it to the set of memory entries 𝑀 = 𝑚 1 , … , 𝑚 𝑁 where 𝑇 ≫ 𝑁.
How can it learn such a function that maximizes the performance on an unseen
future task without knowing what problems it will be given at the time?
External Memory 𝑀
m(1)
m(2)
m(N)
𝑥(3)
𝑥(1)
𝑥(2)
Streaming Data 𝑋
Query 𝑄
…
𝑥(𝑇)
𝑥 4
…
m(3)
𝐹: 𝑋 → 𝑀
𝑥(5)
m(1)
: 𝑥(1)
m(2)
: 𝑥(8)
m(N)
: 𝑥(𝑇−3)
…
m(3)
: 𝑥(𝑇−9)
Result
&
Performance
Difference from Existing Methods
9
Our question answering task requires a model that can sequentially handle streaming
data without knowing the query.
It is difficult to solve this problem with existing QA methods due to their lack of
scalability.
QA model
𝑥(1)
𝑥(2)
Streaming Data 𝑋
…
Query 𝑄
…
𝑥(10)
𝑥(11)
𝑥(𝑇)
Answer
…
𝑀𝑜𝑑𝑒𝑙(𝑥 1
)
𝑀𝑜𝑑𝑒𝑙(𝑥 2
)
𝑀𝑜𝑑𝑒𝑙(𝑥 𝑇
)
𝑀𝑜𝑑𝑒𝑙(𝑄)
Episodic Memory Reader (EMR)
10
To solve a novel QA task, we propose Episodic Memory Reader (EMR) that sequentially
reads the streaming data and stores the information into the external memory.
It replaces memories that are less important for answering unseen questions when
the external memory is full.
𝑥(𝑇−8)
𝑡 𝑇−8
𝑚1
(𝑇−10)
𝑚2
(𝑇−10)
𝑚3
(𝑇−10)
𝑚4
(𝑇−10)
𝑥(𝑇−9)
𝑡 𝑇−9
EMR
Read Write
External
Memory
𝑚1
(𝑇−9)
𝑚2
(𝑇−9)
𝑚3
(𝑇−9)
𝑥(𝑇−9)
…
𝑥(𝑇)
𝑡 𝑇
EMR
… 𝑡 𝑇+1
Query
QA
model
“Answer”EMR
Learning What to Remember Using RL
11
We use Reinforcement Learning to make which information is important be learned.
We intend that if the agent do the good action, the QA model will output positive
reward then that action is reinforced.
Environment
State
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
…
Pre-trained
QA Model
Current Input
External
Memory …
Action
Streaming
Observe
Replacement
Evaluate
Reward
Episodic
Memory
Reader
(Agent)
The Components of Proposed Model
12
EMR consists of Data encoder, Memory encoder, and Value network to output the
importance between memories.
It learns how to retain important information in order to maximize its QA accuracy at
a future timepoint.
…
Data Encoder
Memory
Encoder
External
Memory
QA model
Reward (e.g. F1score, Acc.)
𝑄𝑢𝑒𝑟𝑦
“Answer”
Episodic Memory Reader (EMR)
Read
2
…
Multi-layer
Perceptron
Policy Network (Actor)
Value Network (Critic)
…
Multi-layer
Perceptron
…Policy (π)
π 𝑚 𝑠)
…
GRU
Cell
𝑉𝑎𝑙𝑢𝑒 (𝑉)
Write
Data Encoder
13
Data encoder encodes the input data to the memory vector representation.
The model for the encoder is varied based on the type of the input.
…
Data Encoder
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
External Memory
Text Input
…𝑓𝑒𝑛𝑠 𝑎𝑙𝑠𝑜 𝑘𝑛𝑜𝑤𝑛 𝑠𝑒𝑣𝑒𝑟𝑎𝑙
Embedding Layer
biGRU
…
Processing of Data Encoder
𝑒(𝑡)
= 𝑚6
(𝑡)
Data Encoder
14
Data encoder encodes the input data to the memory vector representation.
The model for the encoder is varied based on the type of the input.
…
Data Encoder
External Memory
Image Input
CNN
Processing of Data Encoder
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑒(𝑡)
= 𝑚6
(𝑡)
Memory Encoder
15
Memory Encoder computes the replacement probability by considering the
importance of memory entries.
We devise 3 different types of memory encoder.
1. EMR-Independent
2. EMR-biGRU
3. EMR-Transformer
External
Memory
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
Memory
Encoder
Policy
Network
External
Memory
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
Policy (π)
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
…
Replace
Memory Encoder (EMR-Independent)
16
Similar to [Gülçehre18], EMR-Independent captures the relative importance of each
memory entry independently to the new data instance.
[Gülçehre18] Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes. NC 2018
𝛼1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
External Memory
𝛼2
(𝑡)
𝛼3
(𝑡)
𝛼4
(𝑡)
𝛼5
(𝑡)
𝛾1
(𝑡)
𝛾2
(𝑡)
𝛾3
(𝑡)
𝛾4
(𝑡)
𝛾5
(𝑡)
𝑣1
(𝑡−1)
𝑣2
(𝑡−1)
𝑣3
(𝑡−1)
𝑣4
(𝑡−1)
𝑣5
(𝑡−1)
𝑔1
(𝑡)
𝑔2
(𝑡)
𝑔3
(𝑡)
𝑔4
(𝑡)
𝑔5
(𝑡)
Memory Encoder
(Independent)
Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
Major drawback is that the evaluation of each memory depends only on the input.
Memory Encoder (EMR-biGRU)
17
EMR-biGRU computes the importance of each memory entry considering relative
relationships between memory entries using a bidirectional GRU.
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
External Memory
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
Memory Encoder (biGRU) Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
𝑚6
(𝑡)
ℎ6
(𝑡)
Memory Encoder (EMR-biGRU)
18
EMR-biGRU computes the importance of each memory entry considering relative
relationships between memory entries using bidirectional GRU.
External Memory Memory Encoder (biGRU) Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
However, the importance of each memory entry can be learned in relation to its
neighbor rather than independently.
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
𝑚6
(𝑡)
ℎ6
(𝑡)
Memory Encoder (EMR-Transformer)
19
EMR-Transformer computes the relative importance of each memory entry using a
self-attention mechanism from [Vaswani17].
[Vaswani17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is All you Need. NIPS 2017
External Memory Memory Encoder
(Transformer)
Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
𝑚6
(𝑡)
ℎ6
(𝑡)
Value Network
20
Two types of reinforcement learning are used – A3C [Mnih16] or REINFORCE [Williams92].
We adopt Deep Sets from [Zaheer17] to make set representation ℎ 𝑡
.
[Zaheer17] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. R. Salakhutdinov, A. J. Smola, Deep Sets. NIPS 2017
[Williams92] R. J. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. ML 1992
[Mnih16] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning. ICML 2016
External Memory Memory Encoder
(Transformer)
Multi-layer
Perceptron
GRU Cellℎ(𝑡)
𝑉(𝑡)
ℎ(𝑡−1)
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
𝑚6
(𝑡)
ℎ6
(𝑡)
Value Network
Example of EMR in Action
21
For better understanding, we make the summary of the working process of our model
using TriviaQA example.
We assume the EMR-Transformer with the external memory which can hold 100 words.
Q: Which US state lends its name to a baked pudding,
made with ice cream, sponge and meringue?
Meringue is type of dessert often associated with Italian
Swiss and French cuisine made from whipped egg whites
and sugar
Sentence is streamed at a constant rate from Wikipedia
Document.
→ 20 words per each time
Example of EMR in Action
22
The streamed sentence is encoded and stored to the external memory.
It writes the data representation to the memory until it becomes full.
Meringue is type of dessert often associated with Italian
Swiss and French cuisine made from whipped egg whites
and sugar
𝑚1
1
Data Encoder
Embedding Layer
…𝑀𝑒𝑟𝑖𝑛𝑔𝑢𝑒 𝑖𝑠 𝑡𝑦𝑝𝑒 𝑠𝑢𝑔𝑎𝑟
𝑒 𝑡
biGRU
…
Wikipedia
Document
External Memory
Question
Example of EMR in action
23
If the memory is full, it needs to decide which memory entry to be deleted.
For this end, Episodic Memory Reader (EMR) ‘reads’ memory entries and current
input data using the memory encoder.
Data Encoder
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
𝑚6
6
External Memory
Memory
Encoder
Read
Wikipedia
Document
Question
Example of EMR in Action
24
Then, the memory encoder outputs the policy – action probability for deleting the
least important entry.
Policy (π)
π 𝑚 𝑠)
…
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
𝑚6
6
External Memory
Memory
Encoder
Read
𝑚3
6
Example of EMR in Action
25
The entry with the highest probability is deleted and other entries are pushed.
And new data instance is “written” on the last entry of the external memory.
Policy (π)
π 𝑚 𝑠)
…
𝑚1
7
𝑚2
7
𝑚4
7
𝑚5
7
External Memory
Memory
Encoder
Read
𝑚6
6
Example of EMR in action
26
The entry with the highest probability is deleted and other entries are pushed.
And new data instance is “written” on the last entry of the external memory.
Policy (π)
π 𝑚 𝑠)
…
𝑚1
7
𝑚2
7
𝑚3
7
𝑚4
7
𝑚6
7
External Memory
Memory
Encoder
Read
Write
𝑚5
7
Example of EMR in action
27
When it encounters the question, the QA model outputs the answer of given question
using the sentences in the external memory.
Wikipedia
Document
Question
𝑚1
𝑇
𝑚2
𝑇
𝑚3
𝑇
𝑚4
𝑇
𝑚5
𝑇
External Memory
Meringue is type of dessert
often associated with French
swiss and Italian cuisine made
from whipped egg whites and
sugar or aquafaba and sugar and
into egg whites used for
decoration on pie or spread on
sheet or baked Alaska base and
baked swiss meringue is
hydrates from refined sugar
…
Sentences in External Memory
QA model
(BERT)
Q: Which US state lends its name to a baked pudding, made
with ice cream, sponge and meringue?
A: Alaska
Training the Episodic Memory Reader
28
For training, the model is trained after each “data stream”.
Also, deleting entry is selected stochastically based on the policy.
Wikipedia
Document
Question External Memory
Episodic
Memory
Reader
Streaming Data
𝑥1
Pre-trained
QA model
(BERT)
History
Read Write
Policy (π)
π 𝑚 𝑠)
…
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
Training the Episodic Memory Reader
29
After the memory update, the performance of the task is evaluated.
To evaluate the policy, we provide the future query at every time step only during
training time.
In TriviaQA, it is the F1 Score and it is used as the reward for reinforcement learning.
Wikipedia
Document
Question
Streaming Data
Policy (π)
π 𝑚 𝑠)
…
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
𝑥1 Episodic
Memory
Reader
External Memory
Pre-trained
QA model
(BERT)
History
F1 Score
(Reward)
Training the Episodic Memory Reader
30
Then, reward and action probabilities are stored for further training steps.
This process is repeated until the data stream ends.
Wikipedia
Document
Question
Streaming Data
Policy (π)
π 𝑚 𝑠)
…
F1 Score
(Reward)
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
𝑥1 Episodic
Memory
Reader
External Memory
Pre-trained
QA model
(BERT)
History
Store
Training the Episodic Memory Reader
31
At the end of the stream, QA Loss is computed from the QA model and RL Loss is
computed from the reinforcement learning algorithm using stored history.
Then, the QA model and Episodic Memory Reader are jointly trained.
Wikipedia
Document
Question
Streaming Data
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
𝑥1 Episodic
Memory
Reader
External Memory
Pre-trained
QA model
(BERT)
HistoryQA Loss
Reinforcement
Learning
RL Loss
Reinforcement Learning Loss
32
RL loss is based on the basic Actor-Critic learning loss. When computing RL loss, we
also use the Entropy Loss to explore various possibilities.
History
Reinforcement Learning
𝑅𝑖 = 0.99 ∗ 𝑅𝑖−1 + 𝑟𝑖
𝐿 𝑣𝑎𝑙𝑢𝑒 =
𝑖
1
2
∗ 𝑅𝑖 − 𝑉𝑖
2
𝐿 𝑝𝑜𝑙𝑖𝑐𝑦 = 𝑖 −(𝑟𝑖 + 0.99 ∗ 𝑉𝑖+1 − 𝑉𝑖) ∗ 𝑙𝑜𝑔 𝑝𝑖 − 0.01 ∗ 𝑒𝑖
RL Loss = 𝐿 𝑣𝑎𝑙𝑢𝑒 + 𝐿 𝑝𝑜𝑙𝑖𝑐𝑦
i-th history
Action Probability 𝑝𝑖
Value 𝑉𝑖
Reward 𝑟𝑖
Entropy 𝑒𝑖 =
𝑗 𝑝𝑖𝑗 𝑙𝑜𝑔𝑝𝑖𝑗
Experiment - Baselines
33
EMR that makes the policy in an independent method based on Dynamic Least
Recently Used from [Gülçehre18].
• FIFO (First-In First-Out) • Uniform • LIFO (Last-In First-Out)
• EMR-Independent
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
External Memory
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
External Memory
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
External Memory
Policy (π)
π 𝑚 𝑠)
𝑚5
6
Policy (π)
π 𝑚 𝑠)
𝑚5
6
Policy (π)
π 𝑚 𝑠)
We experiment our EMR-biGRU and EMR-Transformer against several baselines.
[Gülçehre18] Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes. NC 2018
Dataset (bAbI Dataset)
34
We evaluate our models and baselines on three question answering datasets.
[Weston15] S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus: End-To-End Memory Networks. NIPS 2015
• bAbI [Weston15]: A synthetic dataset for episodic question answering,
consisting of 20 tasks with small amount of vocabulary.
Original Task 2 Noisy Task 2
Index Context
Mary journeyed to the bathroom
Sandra went to the garden
Sandra put down the milk there
1
2
6
…
…
Where is the milk? Garden [2, 6]
Daniel went to the garden
Daniel dropped the football
8
17
…
…
Where is the football? Bedroom [12, 17]
Index Context
Sandra moved to the kitchen
Wolves are afraid of cats
Mary is green
1
2
6
…
…
Where is the milk? Garden [1, 4]
Mice are afraid of wolves
Mary journeyed to the kitchen
38
42
…
…
Where is the apple? Kitchen [34, 42]
→ It can be solved by remembering a person or an object.
Dataset (TriviaQA Dataset)
35
We evaluate our models and baselines on three question answering datasets.
[Joshi17] M. Joshi, E. Choi, D. S. Weld, L. Zettlemoyer, TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. ACL 2017
[Context]
001 World War I (WWI or WW1), also known as the First World War, or the Great War, was a global war originating in …
002 More than 70 million military personnel, including 60 million Europeans, were mobilised in one of the largest wars in …
550 Some war memorials date the end of the war as being when the Versailles Treaty was signed in 1919, …
770 Britain, rationing was finally imposed in early 1918, limited to meat, sugar, and fats (butter and margarine), but not bread.
……
[Question]
Where was the peace treaty signed that brought World War I to an end?
[Answer]
Versailles castle
• TriviaQA [Joshi17]: a realistic text-based question answering dataset,
including 95K question-answer pairs from 662K documents.
→ It requires a model with high-level reasoning and capability for reading
large amount of sentences in document.
Dataset (TVQA Dataset)
36
We evaluate our models and baselines on three question answering datasets.
[Lei18] J. Lei, L. Yu, M. Bansal, T. L. Berg, TVQA: Localized, Compositional Video Question Answering. EMNLP 2018
[Video Clip]
[Question] What is Kutner writing on when talking to Lawrence?
[Answer 1] Kutner is writing on a clipboard.
[Answer 2] Kutner is writing on a laptop.
[Answer 3] Kutner is writing on a notepad.
[Answer 4] Kutner is writing on an index card.
[Answer 5] Kutner is writing on his hand.
… …
• TVQA [Lei18]: a localized, compositional video question answering dataset
containing 153K question-answer pairs from 22K clips in 6 TV series.
→ It requires a model that is able to understand multi-modal information.
Sentence 3
Sentence N-14
Experiment on bAbI Dataset
37
We combine our model with MemN2N [Weston15] on bAbI dataset.
[Weston15] S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus: End-To-End Memory Networks. NIPS 2015
In this experiment, we set varying memory size to evaluate the efficiency of our model.
EMR
QA
model
(MemN2N)
Sentence 1
Sentence 2
Query M
External Memory
Sentence 1
Sentence 2
Write
Sentence N-13
Sentence N
Streaming Data
Read
………
Sentence 2
Sentence 2
“Answer”
Query 1
Result (Accuracy)
38
Both of our models (EMR-biGRU and EMR-Transformer) outperform the baselines.
Our methods are able to retain the supporting facts even with small number of
memory entries.
Original Noisy
Result (Solvable)
39
Also, we report how many supporting facts the models retain in the external memory.
Two EMR variants significantly outperform EMR-Independent as well as rule-based
memory scheduling policies.
Original Noisy
Chunk 3 (20 words)
Chunk 2 (20 words)
Experiment on TriviaQA Dataset
40
We combine our model with BERT [Devlin18] on TriviaQA dataset.
[Devlin18] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019
We extract the indices of span prediction since TriviaQA does not provide it.
We embed 20 words into each memory entry and hold 400 words at maximum.
Wikipedia
Document
Question
Chunk 1 (20 words)
Query
Chunk 16 (20 words)
Chunk N (20 words)
Streaming Data
……
EMR
QA
model
(BERT)
External MemoryWrite
Read
“Answer”
Chunk 3
Chunk N
Chunk 7
Chunk 17
Chunk N
…
Result (Accuracy)
41
Both our models (EMR-biGRU and EMR-Transformer) outperform the baselines.
Model ExactMatch F1score
FIFO 24.53 27.22
Uniform 28.30 34.39
LIFO 46.23 50.10
EMR-Independent 38.05 41.15
EMR-biGRU 52.20 57.57
EMR-Transformer 48.43 53.81
LIFO performs quite well unlike the other rule-based scheduling policies because
most answers are spanned in earlier part of the documents.
Result of TriviaQA Indices of answers following doc. length
Experiment on TVQA Dataset
42
Multi-stream model from [Lei18] is used as QA model with ours.
[Lei18] J. Lei, L. Yu, M. Bansal, T. L. Berg, TVQA: Localized, Compositional Video Question Answering. EMNLP 2018
A subtitle is attached to the frame which has the start of the subtitle.
One frame and the corresponding subtitle are jointly embedded into each memory
entry.
Frame | No Sub. 3
Frame | Subtitle 2
Video Clip
Question
Frame | Subtitle 1
Query
Frame | Subtitle 4
Frame | SubtitleN
Streaming Data
…
EMR
QA model
(Multi-stream)
External MemoryWrite
Read
“Answer”
Frame | Subtitle 2
Frame | SubtitleN
Frame | Subtitle 5
Frame | Subtitle9
Frame | No Sub. 5
…
Result (Accuracy)
43
Both our models (EMR-biGRU and EMR-Transformer) outperform the baselines.
Our methods are able to retain the frames and subtitles even with small number of
memory entries.
Example of TVQA Result
44
[Question] < 00:55.00 ~ 01:06.33 >
Who enters the coffee shop after Ross shows everyone the paper?
[Answer]
1) Joey 2) Rachel 3) Monica 4) Chandler 5) Phoebe
[Video Clip]
00:01 00:02 00:03 1:32 1:33
…
[Subtitle]
00:03  UNKNAME: Hey. I got some bad news. What?
00:05  UNKNAME: That’s no way to sell newspapers ...
(Ellipsis)
01:31  UNKNAME: Your food is abysmal!
[Subtitle]
𝑚0  UNKNAME: No. Monica’s restaurant got a horrible review ...
𝑚1  UNKNAME: I didn’t want her to see it, so I ran around and ...
𝑚2  Joey: This is bad. And I’ve had bad reviews.
[Memory Information after Reading Streaming Data]
𝑚3  Monica: Oh, my God! Look at all the newspapers.
𝑚4  UNKNAME: They say there’s no such thing as …
𝑚0 𝑚1 𝑚2 𝒎 𝟑 𝑚4
Visualization of Memory Entries
45
EMR learns general importance by storing the information corresponding to solving
unknown queries in the external memory.
Conclusion
46
• We propose a novel task of learning to remember important instances from
streaming data and show it on the question answering task.
Codes available at https://github.com/h19920918/emr
• Episodic Memory Reader (EMR) learns general importance by considering
relative importance between memory entries without knowing queries in order
to maximize the performance of the QA task.
• Results show that our models retain the information for answering even with
small number of memory entries relative to the length of streams.
• We believe that our work can be an essential part toward building real-world
conversation agent.
Thank you
Q&A

More Related Content

Similar to Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data

Differential Neural Computers
Differential Neural ComputersDifferential Neural Computers
Differential Neural Computers
deawoo Kim
 
Unit 5-lecture-1
Unit 5-lecture-1Unit 5-lecture-1
Unit 5-lecture-1
vishal choudhary
 
Intro to deep learning_ matteo alberti
Intro to deep learning_ matteo albertiIntro to deep learning_ matteo alberti
Intro to deep learning_ matteo alberti
Deep Learning Italia
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
hyunsung lee
 
L6 primary-memory
L6 primary-memoryL6 primary-memory
L6 primary-memory
rsamurti
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
SungminYou
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
CloudxLab
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Engr. Md. Jamal Uddin Rayhan
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Testing nanometer memories: a review of architectures, applications, and chal...
Testing nanometer memories: a review of architectures, applications, and chal...Testing nanometer memories: a review of architectures, applications, and chal...
Testing nanometer memories: a review of architectures, applications, and chal...
IJECEIAES
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
Mumthas Shaikh
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
JohanAspro
 
Ln liers
Ln liersLn liers
Ln liers
Shodhan Kini
 
test1
test1test1
Software development slides
Software development slidesSoftware development slides
Software development slidesiarthur
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Wuhyun Rico Shin
 
Gossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large cloudsGossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large clouds
Rerngvit Yanggratoke
 
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptxwaserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
abcxyz19691969
 

Similar to Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data (20)

Differential Neural Computers
Differential Neural ComputersDifferential Neural Computers
Differential Neural Computers
 
Unit 5-lecture-1
Unit 5-lecture-1Unit 5-lecture-1
Unit 5-lecture-1
 
Intro to deep learning_ matteo alberti
Intro to deep learning_ matteo albertiIntro to deep learning_ matteo alberti
Intro to deep learning_ matteo alberti
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
L6 primary-memory
L6 primary-memoryL6 primary-memory
L6 primary-memory
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Testing nanometer memories: a review of architectures, applications, and chal...
Testing nanometer memories: a review of architectures, applications, and chal...Testing nanometer memories: a review of architectures, applications, and chal...
Testing nanometer memories: a review of architectures, applications, and chal...
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
 
Ln liers
Ln liersLn liers
Ln liers
 
test1
test1test1
test1
 
Software development slides
Software development slidesSoftware development slides
Software development slides
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
Gossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large cloudsGossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large clouds
 
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptxwaserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
 

More from LGCNSairesearch

KorQuAD v2.0 소개
KorQuAD v2.0 소개KorQuAD v2.0 소개
KorQuAD v2.0 소개
LGCNSairesearch
 
[saltlux] KorQuAD v1.0 참관기
[saltlux] KorQuAD v1.0 참관기[saltlux] KorQuAD v1.0 참관기
[saltlux] KorQuAD v1.0 참관기
LGCNSairesearch
 
KorQuAD v1.0 Turn up
KorQuAD v1.0 Turn upKorQuAD v1.0 Turn up
KorQuAD v1.0 Turn up
LGCNSairesearch
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
LGCNSairesearch
 
NLU Tech Talk with KorBERT
NLU Tech Talk with KorBERTNLU Tech Talk with KorBERT
NLU Tech Talk with KorBERT
LGCNSairesearch
 
딥러닝 기반의 자연어처리 최근 연구 동향
딥러닝 기반의 자연어처리 최근 연구 동향딥러닝 기반의 자연어처리 최근 연구 동향
딥러닝 기반의 자연어처리 최근 연구 동향
LGCNSairesearch
 
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...
LGCNSairesearch
 

More from LGCNSairesearch (7)

KorQuAD v2.0 소개
KorQuAD v2.0 소개KorQuAD v2.0 소개
KorQuAD v2.0 소개
 
[saltlux] KorQuAD v1.0 참관기
[saltlux] KorQuAD v1.0 참관기[saltlux] KorQuAD v1.0 참관기
[saltlux] KorQuAD v1.0 참관기
 
KorQuAD v1.0 Turn up
KorQuAD v1.0 Turn upKorQuAD v1.0 Turn up
KorQuAD v1.0 Turn up
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
 
NLU Tech Talk with KorBERT
NLU Tech Talk with KorBERTNLU Tech Talk with KorBERT
NLU Tech Talk with KorBERT
 
딥러닝 기반의 자연어처리 최근 연구 동향
딥러닝 기반의 자연어처리 최근 연구 동향딥러닝 기반의 자연어처리 최근 연구 동향
딥러닝 기반의 자연어처리 최근 연구 동향
 
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data

  • 1. Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data Moonsu Han1*, Minki Kang1*, Hyunwoo Jung1,3 and Sung Ju Hwang1,2 KAIST1, Daejeon, South Korea AITRICS2, Seoul, South Korea NAVER Clova3, Seongnam, South Korea 1
  • 2. Motivation 2 When interacting with users, an agent should remember the information of the user. The agent does not know when and what information and questions will be given. User Agent It was a hard day. I do not like noisy places. How about a rest at home? I sometimes read books when I have a break. …… Today is a holiday. How about taking a break and reading a book at home? (Clova WAVE)
  • 3. John I like potatoes. What is my name? NEUT: John Memory Scalability Problem 3 The best way to preserve the information is to store all of it in external memory. However, it cannot retain and learn all information due to limitations on memory capacity. User POS: Library NEG: Noisy POS: Baseball NEUT: Senior POS: Potato … Read External Memory NEUT: John Agent (Clova WAVE) Write
  • 4. Learning What to Remember from Streaming Data 4 We substitute this problem for a novel question answering (QA) task, where the machine does not know when the questions will be given from streaming data. Inspired by this motivation, we define a new problem that can arise in the real-world and that should be addressed in the future to build a conversation agent. Data 1 Data 2 External Memory QA model Supporting fact 1 Data 1 Streaming Data Query 1 … Supporting fact T Query T … Data 1 Data 2Data 1 Data 2 … Supporting fact T Supporting fact 3 Data 95 Data 150 Data 154 Supporting fact TSupporting fact T Data 2 WriteRead Answer T Supporting fact T
  • 5. Learning What to Remember from Streaming Data 5 The model can answer unknown queries by storing incoming data until the external memory is full. Write External Memory QA model Supporting fact 1 Data 1 Data 2 Streaming Data Query 1 … Supporting fact T Query T Data 3 Data 1 Data 2 Data 4 Supporting fact 1 … Data 1 Data 1 Data 1 Data 2Read Data 3 Data 2 Data 2 Supporting fact 1Data 4 Data 3 Data 4 Supporting fact 1 Supporting fact 1 Query 1 Answer 1 Supporting fact 1
  • 6. Learning What to Remember from Streaming Data 6 When the memory is full, the model should determine which memory entry or incoming data to discard. External Memory Supporting fact T QA model Supporting fact 1 Data 1 Supporting fact 1 Data 1 Data 2 Streaming Data Query 1 … Supporting fact T Query T … In this situation, it can easily decide to replace the entry since almost all memory entries are useless. Write Data 2 Data 3 Data 4 Supporting fact T Supporting fact T Supporting fact T Supporting fact T
  • 7. Learning What to Remember from Streaming Data 7 In the case of when the memory is full of supporting facts, it is difficult to decide which memory entry to delete. External Memory Supporting fact T QA model Supporting fact 1 Supporting fact 2 Supporting fact 7 Supporting fact 8 Supporting fact 9 Which memory entry is no longer needed? Supporting fact 1 Data 1 Data 2 Streaming Data Query 1 … Supporting fact T Query T … Therefore, there is a need for a model that learns general importance of an instance of data and which data is important at what time.
  • 8. Problem Definition 8 Given a data stream 𝑋 = 𝑥 1 , … , 𝑥 𝑇 , a model should learn a function 𝑭: 𝑿 → 𝑴 that maps it to the set of memory entries 𝑀 = 𝑚 1 , … , 𝑚 𝑁 where 𝑇 ≫ 𝑁. How can it learn such a function that maximizes the performance on an unseen future task without knowing what problems it will be given at the time? External Memory 𝑀 m(1) m(2) m(N) 𝑥(3) 𝑥(1) 𝑥(2) Streaming Data 𝑋 Query 𝑄 … 𝑥(𝑇) 𝑥 4 … m(3) 𝐹: 𝑋 → 𝑀 𝑥(5) m(1) : 𝑥(1) m(2) : 𝑥(8) m(N) : 𝑥(𝑇−3) … m(3) : 𝑥(𝑇−9) Result & Performance
  • 9. Difference from Existing Methods 9 Our question answering task requires a model that can sequentially handle streaming data without knowing the query. It is difficult to solve this problem with existing QA methods due to their lack of scalability. QA model 𝑥(1) 𝑥(2) Streaming Data 𝑋 … Query 𝑄 … 𝑥(10) 𝑥(11) 𝑥(𝑇) Answer … 𝑀𝑜𝑑𝑒𝑙(𝑥 1 ) 𝑀𝑜𝑑𝑒𝑙(𝑥 2 ) 𝑀𝑜𝑑𝑒𝑙(𝑥 𝑇 ) 𝑀𝑜𝑑𝑒𝑙(𝑄)
  • 10. Episodic Memory Reader (EMR) 10 To solve a novel QA task, we propose Episodic Memory Reader (EMR) that sequentially reads the streaming data and stores the information into the external memory. It replaces memories that are less important for answering unseen questions when the external memory is full. 𝑥(𝑇−8) 𝑡 𝑇−8 𝑚1 (𝑇−10) 𝑚2 (𝑇−10) 𝑚3 (𝑇−10) 𝑚4 (𝑇−10) 𝑥(𝑇−9) 𝑡 𝑇−9 EMR Read Write External Memory 𝑚1 (𝑇−9) 𝑚2 (𝑇−9) 𝑚3 (𝑇−9) 𝑥(𝑇−9) … 𝑥(𝑇) 𝑡 𝑇 EMR … 𝑡 𝑇+1 Query QA model “Answer”EMR
  • 11. Learning What to Remember Using RL 11 We use Reinforcement Learning to make which information is important be learned. We intend that if the agent do the good action, the QA model will output positive reward then that action is reinforced. Environment State Supporting fact 1 Data 1 Data 2 Streaming Data Query 1 … Supporting fact T Query T … Pre-trained QA Model Current Input External Memory … Action Streaming Observe Replacement Evaluate Reward Episodic Memory Reader (Agent)
  • 12. The Components of Proposed Model 12 EMR consists of Data encoder, Memory encoder, and Value network to output the importance between memories. It learns how to retain important information in order to maximize its QA accuracy at a future timepoint. … Data Encoder Memory Encoder External Memory QA model Reward (e.g. F1score, Acc.) 𝑄𝑢𝑒𝑟𝑦 “Answer” Episodic Memory Reader (EMR) Read 2 … Multi-layer Perceptron Policy Network (Actor) Value Network (Critic) … Multi-layer Perceptron …Policy (π) π 𝑚 𝑠) … GRU Cell 𝑉𝑎𝑙𝑢𝑒 (𝑉) Write
  • 13. Data Encoder 13 Data encoder encodes the input data to the memory vector representation. The model for the encoder is varied based on the type of the input. … Data Encoder 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) External Memory Text Input …𝑓𝑒𝑛𝑠 𝑎𝑙𝑠𝑜 𝑘𝑛𝑜𝑤𝑛 𝑠𝑒𝑣𝑒𝑟𝑎𝑙 Embedding Layer biGRU … Processing of Data Encoder 𝑒(𝑡) = 𝑚6 (𝑡)
  • 14. Data Encoder 14 Data encoder encodes the input data to the memory vector representation. The model for the encoder is varied based on the type of the input. … Data Encoder External Memory Image Input CNN Processing of Data Encoder 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) 𝑒(𝑡) = 𝑚6 (𝑡)
  • 15. Memory Encoder 15 Memory Encoder computes the replacement probability by considering the importance of memory entries. We devise 3 different types of memory encoder. 1. EMR-Independent 2. EMR-biGRU 3. EMR-Transformer External Memory 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) Memory Encoder Policy Network External Memory 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) Policy (π) 𝜋 𝑖 𝑀 𝑡 , 𝑒 𝑡 ; 𝜃 … Replace
  • 16. Memory Encoder (EMR-Independent) 16 Similar to [Gülçehre18], EMR-Independent captures the relative importance of each memory entry independently to the new data instance. [Gülçehre18] Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes. NC 2018 𝛼1 (𝑡) 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) External Memory 𝛼2 (𝑡) 𝛼3 (𝑡) 𝛼4 (𝑡) 𝛼5 (𝑡) 𝛾1 (𝑡) 𝛾2 (𝑡) 𝛾3 (𝑡) 𝛾4 (𝑡) 𝛾5 (𝑡) 𝑣1 (𝑡−1) 𝑣2 (𝑡−1) 𝑣3 (𝑡−1) 𝑣4 (𝑡−1) 𝑣5 (𝑡−1) 𝑔1 (𝑡) 𝑔2 (𝑡) 𝑔3 (𝑡) 𝑔4 (𝑡) 𝑔5 (𝑡) Memory Encoder (Independent) Policy Network Multi-layer Perceptron 𝜋 𝑖 𝑀 𝑡 , 𝑒 𝑡 ; 𝜃 Major drawback is that the evaluation of each memory depends only on the input.
  • 17. Memory Encoder (EMR-biGRU) 17 EMR-biGRU computes the importance of each memory entry considering relative relationships between memory entries using a bidirectional GRU. 𝑚1 (𝑡) 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) External Memory 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) ℎ1 (𝑡) ℎ2 (𝑡) ℎ3 (𝑡) ℎ4 (𝑡) ℎ5 (𝑡) Memory Encoder (biGRU) Policy Network Multi-layer Perceptron 𝜋 𝑖 𝑀 𝑡 , 𝑒 𝑡 ; 𝜃 𝑚6 (𝑡) ℎ6 (𝑡)
  • 18. Memory Encoder (EMR-biGRU) 18 EMR-biGRU computes the importance of each memory entry considering relative relationships between memory entries using bidirectional GRU. External Memory Memory Encoder (biGRU) Policy Network Multi-layer Perceptron 𝜋 𝑖 𝑀 𝑡 , 𝑒 𝑡 ; 𝜃 However, the importance of each memory entry can be learned in relation to its neighbor rather than independently. 𝑚1 (𝑡) 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) ℎ1 (𝑡) ℎ2 (𝑡) ℎ3 (𝑡) ℎ4 (𝑡) ℎ5 (𝑡) 𝑚6 (𝑡) ℎ6 (𝑡)
  • 19. Memory Encoder (EMR-Transformer) 19 EMR-Transformer computes the relative importance of each memory entry using a self-attention mechanism from [Vaswani17]. [Vaswani17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is All you Need. NIPS 2017 External Memory Memory Encoder (Transformer) Policy Network Multi-layer Perceptron 𝜋 𝑖 𝑀 𝑡 , 𝑒 𝑡 ; 𝜃 𝑚1 (𝑡) 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) ℎ1 (𝑡) ℎ2 (𝑡) ℎ3 (𝑡) ℎ4 (𝑡) ℎ5 (𝑡) 𝑚6 (𝑡) ℎ6 (𝑡)
  • 20. Value Network 20 Two types of reinforcement learning are used – A3C [Mnih16] or REINFORCE [Williams92]. We adopt Deep Sets from [Zaheer17] to make set representation ℎ 𝑡 . [Zaheer17] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. R. Salakhutdinov, A. J. Smola, Deep Sets. NIPS 2017 [Williams92] R. J. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. ML 1992 [Mnih16] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning. ICML 2016 External Memory Memory Encoder (Transformer) Multi-layer Perceptron GRU Cellℎ(𝑡) 𝑉(𝑡) ℎ(𝑡−1) 𝑚1 (𝑡) 𝑚1 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) 𝑚6 (𝑡) 𝑚2 (𝑡) 𝑚3 (𝑡) 𝑚4 (𝑡) 𝑚5 (𝑡) ℎ1 (𝑡) ℎ2 (𝑡) ℎ3 (𝑡) ℎ4 (𝑡) ℎ5 (𝑡) 𝑚6 (𝑡) ℎ6 (𝑡) Value Network
  • 21. Example of EMR in Action 21 For better understanding, we make the summary of the working process of our model using TriviaQA example. We assume the EMR-Transformer with the external memory which can hold 100 words. Q: Which US state lends its name to a baked pudding, made with ice cream, sponge and meringue? Meringue is type of dessert often associated with Italian Swiss and French cuisine made from whipped egg whites and sugar Sentence is streamed at a constant rate from Wikipedia Document. → 20 words per each time
  • 22. Example of EMR in Action 22 The streamed sentence is encoded and stored to the external memory. It writes the data representation to the memory until it becomes full. Meringue is type of dessert often associated with Italian Swiss and French cuisine made from whipped egg whites and sugar 𝑚1 1 Data Encoder Embedding Layer …𝑀𝑒𝑟𝑖𝑛𝑔𝑢𝑒 𝑖𝑠 𝑡𝑦𝑝𝑒 𝑠𝑢𝑔𝑎𝑟 𝑒 𝑡 biGRU … Wikipedia Document External Memory Question
  • 23. Example of EMR in action 23 If the memory is full, it needs to decide which memory entry to be deleted. For this end, Episodic Memory Reader (EMR) ‘reads’ memory entries and current input data using the memory encoder. Data Encoder 𝑚1 6 𝑚2 6 𝑚3 6 𝑚4 6 𝑚5 6 𝑚6 6 External Memory Memory Encoder Read Wikipedia Document Question
  • 24. Example of EMR in Action 24 Then, the memory encoder outputs the policy – action probability for deleting the least important entry. Policy (π) π 𝑚 𝑠) … 𝑚1 6 𝑚2 6 𝑚3 6 𝑚4 6 𝑚5 6 𝑚6 6 External Memory Memory Encoder Read 𝑚3 6
  • 25. Example of EMR in Action 25 The entry with the highest probability is deleted and other entries are pushed. And new data instance is “written” on the last entry of the external memory. Policy (π) π 𝑚 𝑠) … 𝑚1 7 𝑚2 7 𝑚4 7 𝑚5 7 External Memory Memory Encoder Read 𝑚6 6
  • 26. Example of EMR in action 26 The entry with the highest probability is deleted and other entries are pushed. And new data instance is “written” on the last entry of the external memory. Policy (π) π 𝑚 𝑠) … 𝑚1 7 𝑚2 7 𝑚3 7 𝑚4 7 𝑚6 7 External Memory Memory Encoder Read Write 𝑚5 7
  • 27. Example of EMR in action 27 When it encounters the question, the QA model outputs the answer of given question using the sentences in the external memory. Wikipedia Document Question 𝑚1 𝑇 𝑚2 𝑇 𝑚3 𝑇 𝑚4 𝑇 𝑚5 𝑇 External Memory Meringue is type of dessert often associated with French swiss and Italian cuisine made from whipped egg whites and sugar or aquafaba and sugar and into egg whites used for decoration on pie or spread on sheet or baked Alaska base and baked swiss meringue is hydrates from refined sugar … Sentences in External Memory QA model (BERT) Q: Which US state lends its name to a baked pudding, made with ice cream, sponge and meringue? A: Alaska
  • 28. Training the Episodic Memory Reader 28 For training, the model is trained after each “data stream”. Also, deleting entry is selected stochastically based on the policy. Wikipedia Document Question External Memory Episodic Memory Reader Streaming Data 𝑥1 Pre-trained QA model (BERT) History Read Write Policy (π) π 𝑚 𝑠) … 𝑥12 𝑥10 … 𝑥 𝑇 𝑥11 𝑞 …
  • 29. Training the Episodic Memory Reader 29 After the memory update, the performance of the task is evaluated. To evaluate the policy, we provide the future query at every time step only during training time. In TriviaQA, it is the F1 Score and it is used as the reward for reinforcement learning. Wikipedia Document Question Streaming Data Policy (π) π 𝑚 𝑠) … 𝑥12 𝑥10 … 𝑥 𝑇 𝑥11 𝑞 … 𝑥1 Episodic Memory Reader External Memory Pre-trained QA model (BERT) History F1 Score (Reward)
  • 30. Training the Episodic Memory Reader 30 Then, reward and action probabilities are stored for further training steps. This process is repeated until the data stream ends. Wikipedia Document Question Streaming Data Policy (π) π 𝑚 𝑠) … F1 Score (Reward) 𝑥12 𝑥10 … 𝑥 𝑇 𝑥11 𝑞 … 𝑥1 Episodic Memory Reader External Memory Pre-trained QA model (BERT) History Store
  • 31. Training the Episodic Memory Reader 31 At the end of the stream, QA Loss is computed from the QA model and RL Loss is computed from the reinforcement learning algorithm using stored history. Then, the QA model and Episodic Memory Reader are jointly trained. Wikipedia Document Question Streaming Data 𝑥12 𝑥10 … 𝑥 𝑇 𝑥11 𝑞 … 𝑥1 Episodic Memory Reader External Memory Pre-trained QA model (BERT) HistoryQA Loss Reinforcement Learning RL Loss
  • 32. Reinforcement Learning Loss 32 RL loss is based on the basic Actor-Critic learning loss. When computing RL loss, we also use the Entropy Loss to explore various possibilities. History Reinforcement Learning 𝑅𝑖 = 0.99 ∗ 𝑅𝑖−1 + 𝑟𝑖 𝐿 𝑣𝑎𝑙𝑢𝑒 = 𝑖 1 2 ∗ 𝑅𝑖 − 𝑉𝑖 2 𝐿 𝑝𝑜𝑙𝑖𝑐𝑦 = 𝑖 −(𝑟𝑖 + 0.99 ∗ 𝑉𝑖+1 − 𝑉𝑖) ∗ 𝑙𝑜𝑔 𝑝𝑖 − 0.01 ∗ 𝑒𝑖 RL Loss = 𝐿 𝑣𝑎𝑙𝑢𝑒 + 𝐿 𝑝𝑜𝑙𝑖𝑐𝑦 i-th history Action Probability 𝑝𝑖 Value 𝑉𝑖 Reward 𝑟𝑖 Entropy 𝑒𝑖 = 𝑗 𝑝𝑖𝑗 𝑙𝑜𝑔𝑝𝑖𝑗
  • 33. Experiment - Baselines 33 EMR that makes the policy in an independent method based on Dynamic Least Recently Used from [Gülçehre18]. • FIFO (First-In First-Out) • Uniform • LIFO (Last-In First-Out) • EMR-Independent 𝑚1 6 𝑚2 6 𝑚3 6 𝑚4 6 𝑚5 6 External Memory 𝑚1 6 𝑚2 6 𝑚3 6 𝑚4 6 𝑚5 6 External Memory 𝑚1 6 𝑚2 6 𝑚3 6 𝑚4 6 𝑚5 6 External Memory Policy (π) π 𝑚 𝑠) 𝑚5 6 Policy (π) π 𝑚 𝑠) 𝑚5 6 Policy (π) π 𝑚 𝑠) We experiment our EMR-biGRU and EMR-Transformer against several baselines. [Gülçehre18] Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes. NC 2018
  • 34. Dataset (bAbI Dataset) 34 We evaluate our models and baselines on three question answering datasets. [Weston15] S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus: End-To-End Memory Networks. NIPS 2015 • bAbI [Weston15]: A synthetic dataset for episodic question answering, consisting of 20 tasks with small amount of vocabulary. Original Task 2 Noisy Task 2 Index Context Mary journeyed to the bathroom Sandra went to the garden Sandra put down the milk there 1 2 6 … … Where is the milk? Garden [2, 6] Daniel went to the garden Daniel dropped the football 8 17 … … Where is the football? Bedroom [12, 17] Index Context Sandra moved to the kitchen Wolves are afraid of cats Mary is green 1 2 6 … … Where is the milk? Garden [1, 4] Mice are afraid of wolves Mary journeyed to the kitchen 38 42 … … Where is the apple? Kitchen [34, 42] → It can be solved by remembering a person or an object.
  • 35. Dataset (TriviaQA Dataset) 35 We evaluate our models and baselines on three question answering datasets. [Joshi17] M. Joshi, E. Choi, D. S. Weld, L. Zettlemoyer, TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. ACL 2017 [Context] 001 World War I (WWI or WW1), also known as the First World War, or the Great War, was a global war originating in … 002 More than 70 million military personnel, including 60 million Europeans, were mobilised in one of the largest wars in … 550 Some war memorials date the end of the war as being when the Versailles Treaty was signed in 1919, … 770 Britain, rationing was finally imposed in early 1918, limited to meat, sugar, and fats (butter and margarine), but not bread. …… [Question] Where was the peace treaty signed that brought World War I to an end? [Answer] Versailles castle • TriviaQA [Joshi17]: a realistic text-based question answering dataset, including 95K question-answer pairs from 662K documents. → It requires a model with high-level reasoning and capability for reading large amount of sentences in document.
  • 36. Dataset (TVQA Dataset) 36 We evaluate our models and baselines on three question answering datasets. [Lei18] J. Lei, L. Yu, M. Bansal, T. L. Berg, TVQA: Localized, Compositional Video Question Answering. EMNLP 2018 [Video Clip] [Question] What is Kutner writing on when talking to Lawrence? [Answer 1] Kutner is writing on a clipboard. [Answer 2] Kutner is writing on a laptop. [Answer 3] Kutner is writing on a notepad. [Answer 4] Kutner is writing on an index card. [Answer 5] Kutner is writing on his hand. … … • TVQA [Lei18]: a localized, compositional video question answering dataset containing 153K question-answer pairs from 22K clips in 6 TV series. → It requires a model that is able to understand multi-modal information.
  • 37. Sentence 3 Sentence N-14 Experiment on bAbI Dataset 37 We combine our model with MemN2N [Weston15] on bAbI dataset. [Weston15] S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus: End-To-End Memory Networks. NIPS 2015 In this experiment, we set varying memory size to evaluate the efficiency of our model. EMR QA model (MemN2N) Sentence 1 Sentence 2 Query M External Memory Sentence 1 Sentence 2 Write Sentence N-13 Sentence N Streaming Data Read ……… Sentence 2 Sentence 2 “Answer” Query 1
  • 38. Result (Accuracy) 38 Both of our models (EMR-biGRU and EMR-Transformer) outperform the baselines. Our methods are able to retain the supporting facts even with small number of memory entries. Original Noisy
  • 39. Result (Solvable) 39 Also, we report how many supporting facts the models retain in the external memory. Two EMR variants significantly outperform EMR-Independent as well as rule-based memory scheduling policies. Original Noisy
  • 40. Chunk 3 (20 words) Chunk 2 (20 words) Experiment on TriviaQA Dataset 40 We combine our model with BERT [Devlin18] on TriviaQA dataset. [Devlin18] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019 We extract the indices of span prediction since TriviaQA does not provide it. We embed 20 words into each memory entry and hold 400 words at maximum. Wikipedia Document Question Chunk 1 (20 words) Query Chunk 16 (20 words) Chunk N (20 words) Streaming Data …… EMR QA model (BERT) External MemoryWrite Read “Answer” Chunk 3 Chunk N Chunk 7 Chunk 17 Chunk N …
  • 41. Result (Accuracy) 41 Both our models (EMR-biGRU and EMR-Transformer) outperform the baselines. Model ExactMatch F1score FIFO 24.53 27.22 Uniform 28.30 34.39 LIFO 46.23 50.10 EMR-Independent 38.05 41.15 EMR-biGRU 52.20 57.57 EMR-Transformer 48.43 53.81 LIFO performs quite well unlike the other rule-based scheduling policies because most answers are spanned in earlier part of the documents. Result of TriviaQA Indices of answers following doc. length
  • 42. Experiment on TVQA Dataset 42 Multi-stream model from [Lei18] is used as QA model with ours. [Lei18] J. Lei, L. Yu, M. Bansal, T. L. Berg, TVQA: Localized, Compositional Video Question Answering. EMNLP 2018 A subtitle is attached to the frame which has the start of the subtitle. One frame and the corresponding subtitle are jointly embedded into each memory entry. Frame | No Sub. 3 Frame | Subtitle 2 Video Clip Question Frame | Subtitle 1 Query Frame | Subtitle 4 Frame | SubtitleN Streaming Data … EMR QA model (Multi-stream) External MemoryWrite Read “Answer” Frame | Subtitle 2 Frame | SubtitleN Frame | Subtitle 5 Frame | Subtitle9 Frame | No Sub. 5 …
  • 43. Result (Accuracy) 43 Both our models (EMR-biGRU and EMR-Transformer) outperform the baselines. Our methods are able to retain the frames and subtitles even with small number of memory entries.
  • 44. Example of TVQA Result 44 [Question] < 00:55.00 ~ 01:06.33 > Who enters the coffee shop after Ross shows everyone the paper? [Answer] 1) Joey 2) Rachel 3) Monica 4) Chandler 5) Phoebe [Video Clip] 00:01 00:02 00:03 1:32 1:33 … [Subtitle] 00:03  UNKNAME: Hey. I got some bad news. What? 00:05  UNKNAME: That’s no way to sell newspapers ... (Ellipsis) 01:31  UNKNAME: Your food is abysmal! [Subtitle] 𝑚0  UNKNAME: No. Monica’s restaurant got a horrible review ... 𝑚1  UNKNAME: I didn’t want her to see it, so I ran around and ... 𝑚2  Joey: This is bad. And I’ve had bad reviews. [Memory Information after Reading Streaming Data] 𝑚3  Monica: Oh, my God! Look at all the newspapers. 𝑚4  UNKNAME: They say there’s no such thing as … 𝑚0 𝑚1 𝑚2 𝒎 𝟑 𝑚4
  • 45. Visualization of Memory Entries 45 EMR learns general importance by storing the information corresponding to solving unknown queries in the external memory.
  • 46. Conclusion 46 • We propose a novel task of learning to remember important instances from streaming data and show it on the question answering task. Codes available at https://github.com/h19920918/emr • Episodic Memory Reader (EMR) learns general importance by considering relative importance between memory entries without knowing queries in order to maximize the performance of the QA task. • Results show that our models retain the information for answering even with small number of memory entries relative to the length of streams. • We believe that our work can be an essential part toward building real-world conversation agent.