2019/09/05
LGCNS AI Tech Talk for NLU (feat.KorQuAD)
- KAIST 한문수, 강민기 님
- Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Deren Lei
Sequential reasoning is a complex human ability, with extensive previous research focusing on gaming AI in a single continuous game, round-based decision makings extending to a sequence of games remain less explored. CounterStrike: Global Offensive (CS:GO), as a round-based game with abundant expert demonstrations, provides an excellent environment for multi-player round-based sequential reasoning. In this work, we propose a Sequence Reasoner with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies behind the round-based purchasing decisions. We adopt few-shot learning to sample multiple rounds in a match, and modified model agnostic meta-learning algorithm Reptile for the meta-learning loop. We formulate each round as a multi-task sequence generation problem. Our state representations combine action encoder, team encoder, player features, round attribute encoder, and economy encoders to help our agent learn to reason under this specific multi-player round-based scenario. A complete ablation study and comparison with the greedy approach certify the effectiveness of our model. Our research will open doors for interpretable AI for understanding episodic and long-term purchasing strategies beyond the gaming community.
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
This Deep Learning interview questions and answers presentation will help you prepare for Deep Learning interviews. This presentation is ideal for both beginners as well as professionals who are appearing for Deep Learning, Machine Learning or Data Science interviews. Learn what are the most important Deep Learning interview questions and answers and know what will set you apart in the interview process.
Some of the important Deep Learning interview questions are listed below:
1. What is Deep Learning?
2. What is a Neural Network?
3. What is a Multilayer Perceptron (MLP)?
4. What is Data Normalization and why do we need it?
5. What is a Boltzmann Machine?
6. What is the role of Activation Functions in neural network?
7. What is a cost function?
8. What is Gradient Descent?
9. What do you understand by Backpropagation?
10. What is the difference between Feedforward Neural Network and Recurrent Neural Network?
11. What are some applications of Recurrent Neural Network?
12. What are Softmax and ReLU functions?
13. What are hyperparameters?
14. What will happen if learning rate is set too low or too high?
15. What is Dropout and Batch Normalization?
16. What is the difference between Batch Gradient Descent and Stochastic Gradient Descent?
17. Explain Overfitting and Underfitting and how to combat them.
18. How are weights initialized in a network?
19. What are the different layers in CNN?
20. What is Pooling in CNN and how does it work?
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change.
There is booming demand for skilled deep learning engineers across a wide range of industries, making this deep learning course with TensorFlow training well-suited for professionals at the intermediate to advanced level of experience. We recommend this deep learning online course particularly for the following professionals:
1. Software engineers
2. Data scientists
3. Data analysts
4. Statisticians with an interest in deep learning
Learn more at: https//www.simplilearn.com
Tactile Brain-Computer Interface Using Classification of P300 Responses Evoke...Takumi Kodama
Kodama T, Makino S, Rutkowski TM. Tactile Brain-Computer Interface Using Classification of P300 Responses Evoked by Full Body Spatial Vibrotactile Stimuli. In: Asia-Pacific Signal and Information Processing Association, 2016 Annual Summit and Conference (APSIPA ASC 2016). APSIPA. Jeju, Korea: IEEE Press; 2016.
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Deren Lei
Sequential reasoning is a complex human ability, with extensive previous research focusing on gaming AI in a single continuous game, round-based decision makings extending to a sequence of games remain less explored. CounterStrike: Global Offensive (CS:GO), as a round-based game with abundant expert demonstrations, provides an excellent environment for multi-player round-based sequential reasoning. In this work, we propose a Sequence Reasoner with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies behind the round-based purchasing decisions. We adopt few-shot learning to sample multiple rounds in a match, and modified model agnostic meta-learning algorithm Reptile for the meta-learning loop. We formulate each round as a multi-task sequence generation problem. Our state representations combine action encoder, team encoder, player features, round attribute encoder, and economy encoders to help our agent learn to reason under this specific multi-player round-based scenario. A complete ablation study and comparison with the greedy approach certify the effectiveness of our model. Our research will open doors for interpretable AI for understanding episodic and long-term purchasing strategies beyond the gaming community.
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
This Deep Learning interview questions and answers presentation will help you prepare for Deep Learning interviews. This presentation is ideal for both beginners as well as professionals who are appearing for Deep Learning, Machine Learning or Data Science interviews. Learn what are the most important Deep Learning interview questions and answers and know what will set you apart in the interview process.
Some of the important Deep Learning interview questions are listed below:
1. What is Deep Learning?
2. What is a Neural Network?
3. What is a Multilayer Perceptron (MLP)?
4. What is Data Normalization and why do we need it?
5. What is a Boltzmann Machine?
6. What is the role of Activation Functions in neural network?
7. What is a cost function?
8. What is Gradient Descent?
9. What do you understand by Backpropagation?
10. What is the difference between Feedforward Neural Network and Recurrent Neural Network?
11. What are some applications of Recurrent Neural Network?
12. What are Softmax and ReLU functions?
13. What are hyperparameters?
14. What will happen if learning rate is set too low or too high?
15. What is Dropout and Batch Normalization?
16. What is the difference between Batch Gradient Descent and Stochastic Gradient Descent?
17. Explain Overfitting and Underfitting and how to combat them.
18. How are weights initialized in a network?
19. What are the different layers in CNN?
20. What is Pooling in CNN and how does it work?
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change.
There is booming demand for skilled deep learning engineers across a wide range of industries, making this deep learning course with TensorFlow training well-suited for professionals at the intermediate to advanced level of experience. We recommend this deep learning online course particularly for the following professionals:
1. Software engineers
2. Data scientists
3. Data analysts
4. Statisticians with an interest in deep learning
Learn more at: https//www.simplilearn.com
Tactile Brain-Computer Interface Using Classification of P300 Responses Evoke...Takumi Kodama
Kodama T, Makino S, Rutkowski TM. Tactile Brain-Computer Interface Using Classification of P300 Responses Evoked by Full Body Spatial Vibrotactile Stimuli. In: Asia-Pacific Signal and Information Processing Association, 2016 Annual Summit and Conference (APSIPA ASC 2016). APSIPA. Jeju, Korea: IEEE Press; 2016.
Testing nanometer memories: a review of architectures, applications, and chal...IJECEIAES
Newer defects in memories arising from shrinking manufacturing technologies demand improved memory testing methodologies. The percentage of memories on chips continues to rise. With shrinking technologies (10 nm up to 1.8 nm), the structure of memories is becoming denser. Due to the dense structure and significant portion of a chip, the nanometer memories are highly susceptible to defects. High-frequency specifications, the complexity of internal connections, and the process variation due to newer manufacturing technology further increased the probability of the physical failure of memories to a great extent. Memories need to be defect-free for the chip to operate successfully. Therefore, testing embedded memories has become crucial and is taking significant test costs. Researchers have proposed multiple approaches considering these factors to test the nanometer memories. They include using new fault models, march algorithms, memory built-in self-test (MBIST) architectures, and validation strategies. This paper surveys the methodologies presented in recent times. It discusses the core principles used in them, along with benefits. Finally, it discusses key opens in each and offers the scope for future research.
This presentation discusses decision trees as a machine learning technique. This introduces the problem with several examples: cricket player selection, medical C-Section diagnosis and Mobile Phone price predictor. It discusses the ID3 algorithm and discusses how the decision tree is induced. The definition and use of the concepts such as Entropy, Information Gain are discussed.
This slide contains information about Memory and its parameters,Classification of memory, Allocation policies, cache memory, Virtual memory, Paging, Segmentation and pipelining
Testing nanometer memories: a review of architectures, applications, and chal...IJECEIAES
Newer defects in memories arising from shrinking manufacturing technologies demand improved memory testing methodologies. The percentage of memories on chips continues to rise. With shrinking technologies (10 nm up to 1.8 nm), the structure of memories is becoming denser. Due to the dense structure and significant portion of a chip, the nanometer memories are highly susceptible to defects. High-frequency specifications, the complexity of internal connections, and the process variation due to newer manufacturing technology further increased the probability of the physical failure of memories to a great extent. Memories need to be defect-free for the chip to operate successfully. Therefore, testing embedded memories has become crucial and is taking significant test costs. Researchers have proposed multiple approaches considering these factors to test the nanometer memories. They include using new fault models, march algorithms, memory built-in self-test (MBIST) architectures, and validation strategies. This paper surveys the methodologies presented in recent times. It discusses the core principles used in them, along with benefits. Finally, it discusses key opens in each and offers the scope for future research.
This presentation discusses decision trees as a machine learning technique. This introduces the problem with several examples: cricket player selection, medical C-Section diagnosis and Mobile Phone price predictor. It discusses the ID3 algorithm and discusses how the decision tree is induced. The definition and use of the concepts such as Entropy, Information Gain are discussed.
This slide contains information about Memory and its parameters,Classification of memory, Allocation policies, cache memory, Virtual memory, Paging, Segmentation and pipelining
Textbook Question Answering (TQA) with Multi-modal Context Graph Understandin...LGCNSairesearch
2019/09/05
LGCNS AI Tech Talk for NLU (feat.KorQuAD)
- 서울대학교 김대식님
- accepted for 2019 ACL
- Textbook Question Answering (TQA) with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data
1. Episodic Memory Reader:
Learning What to Remember
for Question Answering from Streaming Data
Moonsu Han1*, Minki Kang1*, Hyunwoo Jung1,3 and Sung Ju Hwang1,2
KAIST1, Daejeon, South Korea
AITRICS2, Seoul, South Korea
NAVER Clova3, Seongnam, South Korea
1
2. Motivation
2
When interacting with users, an agent should remember the information of the user.
The agent does not know when and what information and questions will be given.
User
Agent
It was a hard day. I do not like noisy places.
How about a rest at home?
I sometimes read books when I have a break.
……
Today is a holiday.
How about taking a break and
reading a book at home?
(Clova WAVE)
3. John
I like potatoes.
What is my name?
NEUT: John
Memory Scalability Problem
3
The best way to preserve the information is to store all of it in external memory.
However, it cannot retain and learn all information due to limitations on memory
capacity.
User
POS: Library
NEG: Noisy
POS: Baseball
NEUT: Senior
POS: Potato
…
Read
External Memory
NEUT: John
Agent
(Clova WAVE)
Write
4. Learning What to Remember from Streaming Data
4
We substitute this problem for a novel question answering (QA) task, where the
machine does not know when the questions will be given from streaming data.
Inspired by this motivation, we define a new problem that can arise in the real-world
and that should be addressed in the future to build a conversation agent.
Data 1
Data 2
External Memory
QA model
Supporting fact 1
Data 1
Streaming Data
Query 1
…
Supporting fact T
Query T
…
Data 1
Data 2Data 1
Data 2
…
Supporting fact T
Supporting fact 3
Data 95
Data 150
Data 154
Supporting fact TSupporting fact T
Data 2
WriteRead
Answer T
Supporting fact T
5. Learning What to Remember from Streaming Data
5
The model can answer unknown queries by storing incoming data until the external
memory is full.
Write
External Memory
QA model
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
Data 3
Data 1
Data 2
Data 4
Supporting fact 1
…
Data 1
Data 1
Data 1
Data 2Read Data 3
Data 2
Data 2
Supporting fact 1Data 4
Data 3
Data 4
Supporting fact 1
Supporting fact 1
Query 1
Answer 1
Supporting fact 1
6. Learning What to Remember from Streaming Data
6
When the memory is full, the model should determine which memory entry or
incoming data to discard.
External Memory
Supporting fact T
QA model
Supporting fact 1
Data 1
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
…
In this situation, it can easily decide to replace the entry since almost all memory
entries are useless.
Write
Data 2
Data 3
Data 4
Supporting fact T
Supporting fact T
Supporting fact T
Supporting fact T
7. Learning What to Remember from Streaming Data
7
In the case of when the memory is full of supporting facts, it is difficult to decide which
memory entry to delete.
External Memory
Supporting fact T
QA model
Supporting fact 1
Supporting fact 2
Supporting fact 7
Supporting fact 8
Supporting fact 9
Which
memory
entry is no
longer
needed?
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
…
Therefore, there is a need for a model that learns general importance of an instance
of data and which data is important at what time.
8. Problem Definition
8
Given a data stream 𝑋 = 𝑥 1 , … , 𝑥 𝑇 , a model should learn a function 𝑭: 𝑿 → 𝑴
that maps it to the set of memory entries 𝑀 = 𝑚 1 , … , 𝑚 𝑁 where 𝑇 ≫ 𝑁.
How can it learn such a function that maximizes the performance on an unseen
future task without knowing what problems it will be given at the time?
External Memory 𝑀
m(1)
m(2)
m(N)
𝑥(3)
𝑥(1)
𝑥(2)
Streaming Data 𝑋
Query 𝑄
…
𝑥(𝑇)
𝑥 4
…
m(3)
𝐹: 𝑋 → 𝑀
𝑥(5)
m(1)
: 𝑥(1)
m(2)
: 𝑥(8)
m(N)
: 𝑥(𝑇−3)
…
m(3)
: 𝑥(𝑇−9)
Result
&
Performance
9. Difference from Existing Methods
9
Our question answering task requires a model that can sequentially handle streaming
data without knowing the query.
It is difficult to solve this problem with existing QA methods due to their lack of
scalability.
QA model
𝑥(1)
𝑥(2)
Streaming Data 𝑋
…
Query 𝑄
…
𝑥(10)
𝑥(11)
𝑥(𝑇)
Answer
…
𝑀𝑜𝑑𝑒𝑙(𝑥 1
)
𝑀𝑜𝑑𝑒𝑙(𝑥 2
)
𝑀𝑜𝑑𝑒𝑙(𝑥 𝑇
)
𝑀𝑜𝑑𝑒𝑙(𝑄)
10. Episodic Memory Reader (EMR)
10
To solve a novel QA task, we propose Episodic Memory Reader (EMR) that sequentially
reads the streaming data and stores the information into the external memory.
It replaces memories that are less important for answering unseen questions when
the external memory is full.
𝑥(𝑇−8)
𝑡 𝑇−8
𝑚1
(𝑇−10)
𝑚2
(𝑇−10)
𝑚3
(𝑇−10)
𝑚4
(𝑇−10)
𝑥(𝑇−9)
𝑡 𝑇−9
EMR
Read Write
External
Memory
𝑚1
(𝑇−9)
𝑚2
(𝑇−9)
𝑚3
(𝑇−9)
𝑥(𝑇−9)
…
𝑥(𝑇)
𝑡 𝑇
EMR
… 𝑡 𝑇+1
Query
QA
model
“Answer”EMR
11. Learning What to Remember Using RL
11
We use Reinforcement Learning to make which information is important be learned.
We intend that if the agent do the good action, the QA model will output positive
reward then that action is reinforced.
Environment
State
Supporting fact 1
Data 1
Data 2
Streaming Data
Query 1
…
Supporting fact T
Query T
…
Pre-trained
QA Model
Current Input
External
Memory …
Action
Streaming
Observe
Replacement
Evaluate
Reward
Episodic
Memory
Reader
(Agent)
12. The Components of Proposed Model
12
EMR consists of Data encoder, Memory encoder, and Value network to output the
importance between memories.
It learns how to retain important information in order to maximize its QA accuracy at
a future timepoint.
…
Data Encoder
Memory
Encoder
External
Memory
QA model
Reward (e.g. F1score, Acc.)
𝑄𝑢𝑒𝑟𝑦
“Answer”
Episodic Memory Reader (EMR)
Read
2
…
Multi-layer
Perceptron
Policy Network (Actor)
Value Network (Critic)
…
Multi-layer
Perceptron
…Policy (π)
π 𝑚 𝑠)
…
GRU
Cell
𝑉𝑎𝑙𝑢𝑒 (𝑉)
Write
13. Data Encoder
13
Data encoder encodes the input data to the memory vector representation.
The model for the encoder is varied based on the type of the input.
…
Data Encoder
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
External Memory
Text Input
…𝑓𝑒𝑛𝑠 𝑎𝑙𝑠𝑜 𝑘𝑛𝑜𝑤𝑛 𝑠𝑒𝑣𝑒𝑟𝑎𝑙
Embedding Layer
biGRU
…
Processing of Data Encoder
𝑒(𝑡)
= 𝑚6
(𝑡)
14. Data Encoder
14
Data encoder encodes the input data to the memory vector representation.
The model for the encoder is varied based on the type of the input.
…
Data Encoder
External Memory
Image Input
CNN
Processing of Data Encoder
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑒(𝑡)
= 𝑚6
(𝑡)
15. Memory Encoder
15
Memory Encoder computes the replacement probability by considering the
importance of memory entries.
We devise 3 different types of memory encoder.
1. EMR-Independent
2. EMR-biGRU
3. EMR-Transformer
External
Memory
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
Memory
Encoder
Policy
Network
External
Memory
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
Policy (π)
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
…
Replace
16. Memory Encoder (EMR-Independent)
16
Similar to [Gülçehre18], EMR-Independent captures the relative importance of each
memory entry independently to the new data instance.
[Gülçehre18] Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes. NC 2018
𝛼1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
External Memory
𝛼2
(𝑡)
𝛼3
(𝑡)
𝛼4
(𝑡)
𝛼5
(𝑡)
𝛾1
(𝑡)
𝛾2
(𝑡)
𝛾3
(𝑡)
𝛾4
(𝑡)
𝛾5
(𝑡)
𝑣1
(𝑡−1)
𝑣2
(𝑡−1)
𝑣3
(𝑡−1)
𝑣4
(𝑡−1)
𝑣5
(𝑡−1)
𝑔1
(𝑡)
𝑔2
(𝑡)
𝑔3
(𝑡)
𝑔4
(𝑡)
𝑔5
(𝑡)
Memory Encoder
(Independent)
Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
Major drawback is that the evaluation of each memory depends only on the input.
18. Memory Encoder (EMR-biGRU)
18
EMR-biGRU computes the importance of each memory entry considering relative
relationships between memory entries using bidirectional GRU.
External Memory Memory Encoder (biGRU) Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
However, the importance of each memory entry can be learned in relation to its
neighbor rather than independently.
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
𝑚6
(𝑡)
ℎ6
(𝑡)
19. Memory Encoder (EMR-Transformer)
19
EMR-Transformer computes the relative importance of each memory entry using a
self-attention mechanism from [Vaswani17].
[Vaswani17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is All you Need. NIPS 2017
External Memory Memory Encoder
(Transformer)
Policy Network
Multi-layer
Perceptron
𝜋 𝑖 𝑀 𝑡
, 𝑒 𝑡
; 𝜃
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
𝑚6
(𝑡)
ℎ6
(𝑡)
20. Value Network
20
Two types of reinforcement learning are used – A3C [Mnih16] or REINFORCE [Williams92].
We adopt Deep Sets from [Zaheer17] to make set representation ℎ 𝑡
.
[Zaheer17] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. R. Salakhutdinov, A. J. Smola, Deep Sets. NIPS 2017
[Williams92] R. J. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. ML 1992
[Mnih16] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning. ICML 2016
External Memory Memory Encoder
(Transformer)
Multi-layer
Perceptron
GRU Cellℎ(𝑡)
𝑉(𝑡)
ℎ(𝑡−1)
𝑚1
(𝑡)
𝑚1
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
𝑚6
(𝑡)
𝑚2
(𝑡)
𝑚3
(𝑡)
𝑚4
(𝑡)
𝑚5
(𝑡)
ℎ1
(𝑡)
ℎ2
(𝑡)
ℎ3
(𝑡)
ℎ4
(𝑡)
ℎ5
(𝑡)
𝑚6
(𝑡)
ℎ6
(𝑡)
Value Network
21. Example of EMR in Action
21
For better understanding, we make the summary of the working process of our model
using TriviaQA example.
We assume the EMR-Transformer with the external memory which can hold 100 words.
Q: Which US state lends its name to a baked pudding,
made with ice cream, sponge and meringue?
Meringue is type of dessert often associated with Italian
Swiss and French cuisine made from whipped egg whites
and sugar
Sentence is streamed at a constant rate from Wikipedia
Document.
→ 20 words per each time
22. Example of EMR in Action
22
The streamed sentence is encoded and stored to the external memory.
It writes the data representation to the memory until it becomes full.
Meringue is type of dessert often associated with Italian
Swiss and French cuisine made from whipped egg whites
and sugar
𝑚1
1
Data Encoder
Embedding Layer
…𝑀𝑒𝑟𝑖𝑛𝑔𝑢𝑒 𝑖𝑠 𝑡𝑦𝑝𝑒 𝑠𝑢𝑔𝑎𝑟
𝑒 𝑡
biGRU
…
Wikipedia
Document
External Memory
Question
23. Example of EMR in action
23
If the memory is full, it needs to decide which memory entry to be deleted.
For this end, Episodic Memory Reader (EMR) ‘reads’ memory entries and current
input data using the memory encoder.
Data Encoder
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
𝑚6
6
External Memory
Memory
Encoder
Read
Wikipedia
Document
Question
24. Example of EMR in Action
24
Then, the memory encoder outputs the policy – action probability for deleting the
least important entry.
Policy (π)
π 𝑚 𝑠)
…
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
𝑚6
6
External Memory
Memory
Encoder
Read
𝑚3
6
25. Example of EMR in Action
25
The entry with the highest probability is deleted and other entries are pushed.
And new data instance is “written” on the last entry of the external memory.
Policy (π)
π 𝑚 𝑠)
…
𝑚1
7
𝑚2
7
𝑚4
7
𝑚5
7
External Memory
Memory
Encoder
Read
𝑚6
6
26. Example of EMR in action
26
The entry with the highest probability is deleted and other entries are pushed.
And new data instance is “written” on the last entry of the external memory.
Policy (π)
π 𝑚 𝑠)
…
𝑚1
7
𝑚2
7
𝑚3
7
𝑚4
7
𝑚6
7
External Memory
Memory
Encoder
Read
Write
𝑚5
7
27. Example of EMR in action
27
When it encounters the question, the QA model outputs the answer of given question
using the sentences in the external memory.
Wikipedia
Document
Question
𝑚1
𝑇
𝑚2
𝑇
𝑚3
𝑇
𝑚4
𝑇
𝑚5
𝑇
External Memory
Meringue is type of dessert
often associated with French
swiss and Italian cuisine made
from whipped egg whites and
sugar or aquafaba and sugar and
into egg whites used for
decoration on pie or spread on
sheet or baked Alaska base and
baked swiss meringue is
hydrates from refined sugar
…
Sentences in External Memory
QA model
(BERT)
Q: Which US state lends its name to a baked pudding, made
with ice cream, sponge and meringue?
A: Alaska
28. Training the Episodic Memory Reader
28
For training, the model is trained after each “data stream”.
Also, deleting entry is selected stochastically based on the policy.
Wikipedia
Document
Question External Memory
Episodic
Memory
Reader
Streaming Data
𝑥1
Pre-trained
QA model
(BERT)
History
Read Write
Policy (π)
π 𝑚 𝑠)
…
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
29. Training the Episodic Memory Reader
29
After the memory update, the performance of the task is evaluated.
To evaluate the policy, we provide the future query at every time step only during
training time.
In TriviaQA, it is the F1 Score and it is used as the reward for reinforcement learning.
Wikipedia
Document
Question
Streaming Data
Policy (π)
π 𝑚 𝑠)
…
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
𝑥1 Episodic
Memory
Reader
External Memory
Pre-trained
QA model
(BERT)
History
F1 Score
(Reward)
30. Training the Episodic Memory Reader
30
Then, reward and action probabilities are stored for further training steps.
This process is repeated until the data stream ends.
Wikipedia
Document
Question
Streaming Data
Policy (π)
π 𝑚 𝑠)
…
F1 Score
(Reward)
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
𝑥1 Episodic
Memory
Reader
External Memory
Pre-trained
QA model
(BERT)
History
Store
31. Training the Episodic Memory Reader
31
At the end of the stream, QA Loss is computed from the QA model and RL Loss is
computed from the reinforcement learning algorithm using stored history.
Then, the QA model and Episodic Memory Reader are jointly trained.
Wikipedia
Document
Question
Streaming Data
𝑥12
𝑥10
…
𝑥 𝑇
𝑥11
𝑞
…
𝑥1 Episodic
Memory
Reader
External Memory
Pre-trained
QA model
(BERT)
HistoryQA Loss
Reinforcement
Learning
RL Loss
32. Reinforcement Learning Loss
32
RL loss is based on the basic Actor-Critic learning loss. When computing RL loss, we
also use the Entropy Loss to explore various possibilities.
History
Reinforcement Learning
𝑅𝑖 = 0.99 ∗ 𝑅𝑖−1 + 𝑟𝑖
𝐿 𝑣𝑎𝑙𝑢𝑒 =
𝑖
1
2
∗ 𝑅𝑖 − 𝑉𝑖
2
𝐿 𝑝𝑜𝑙𝑖𝑐𝑦 = 𝑖 −(𝑟𝑖 + 0.99 ∗ 𝑉𝑖+1 − 𝑉𝑖) ∗ 𝑙𝑜𝑔 𝑝𝑖 − 0.01 ∗ 𝑒𝑖
RL Loss = 𝐿 𝑣𝑎𝑙𝑢𝑒 + 𝐿 𝑝𝑜𝑙𝑖𝑐𝑦
i-th history
Action Probability 𝑝𝑖
Value 𝑉𝑖
Reward 𝑟𝑖
Entropy 𝑒𝑖 =
𝑗 𝑝𝑖𝑗 𝑙𝑜𝑔𝑝𝑖𝑗
33. Experiment - Baselines
33
EMR that makes the policy in an independent method based on Dynamic Least
Recently Used from [Gülçehre18].
• FIFO (First-In First-Out) • Uniform • LIFO (Last-In First-Out)
• EMR-Independent
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
External Memory
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
External Memory
𝑚1
6
𝑚2
6
𝑚3
6
𝑚4
6
𝑚5
6
External Memory
Policy (π)
π 𝑚 𝑠)
𝑚5
6
Policy (π)
π 𝑚 𝑠)
𝑚5
6
Policy (π)
π 𝑚 𝑠)
We experiment our EMR-biGRU and EMR-Transformer against several baselines.
[Gülçehre18] Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes. NC 2018
34. Dataset (bAbI Dataset)
34
We evaluate our models and baselines on three question answering datasets.
[Weston15] S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus: End-To-End Memory Networks. NIPS 2015
• bAbI [Weston15]: A synthetic dataset for episodic question answering,
consisting of 20 tasks with small amount of vocabulary.
Original Task 2 Noisy Task 2
Index Context
Mary journeyed to the bathroom
Sandra went to the garden
Sandra put down the milk there
1
2
6
…
…
Where is the milk? Garden [2, 6]
Daniel went to the garden
Daniel dropped the football
8
17
…
…
Where is the football? Bedroom [12, 17]
Index Context
Sandra moved to the kitchen
Wolves are afraid of cats
Mary is green
1
2
6
…
…
Where is the milk? Garden [1, 4]
Mice are afraid of wolves
Mary journeyed to the kitchen
38
42
…
…
Where is the apple? Kitchen [34, 42]
→ It can be solved by remembering a person or an object.
35. Dataset (TriviaQA Dataset)
35
We evaluate our models and baselines on three question answering datasets.
[Joshi17] M. Joshi, E. Choi, D. S. Weld, L. Zettlemoyer, TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. ACL 2017
[Context]
001 World War I (WWI or WW1), also known as the First World War, or the Great War, was a global war originating in …
002 More than 70 million military personnel, including 60 million Europeans, were mobilised in one of the largest wars in …
550 Some war memorials date the end of the war as being when the Versailles Treaty was signed in 1919, …
770 Britain, rationing was finally imposed in early 1918, limited to meat, sugar, and fats (butter and margarine), but not bread.
……
[Question]
Where was the peace treaty signed that brought World War I to an end?
[Answer]
Versailles castle
• TriviaQA [Joshi17]: a realistic text-based question answering dataset,
including 95K question-answer pairs from 662K documents.
→ It requires a model with high-level reasoning and capability for reading
large amount of sentences in document.
36. Dataset (TVQA Dataset)
36
We evaluate our models and baselines on three question answering datasets.
[Lei18] J. Lei, L. Yu, M. Bansal, T. L. Berg, TVQA: Localized, Compositional Video Question Answering. EMNLP 2018
[Video Clip]
[Question] What is Kutner writing on when talking to Lawrence?
[Answer 1] Kutner is writing on a clipboard.
[Answer 2] Kutner is writing on a laptop.
[Answer 3] Kutner is writing on a notepad.
[Answer 4] Kutner is writing on an index card.
[Answer 5] Kutner is writing on his hand.
… …
• TVQA [Lei18]: a localized, compositional video question answering dataset
containing 153K question-answer pairs from 22K clips in 6 TV series.
→ It requires a model that is able to understand multi-modal information.
37. Sentence 3
Sentence N-14
Experiment on bAbI Dataset
37
We combine our model with MemN2N [Weston15] on bAbI dataset.
[Weston15] S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus: End-To-End Memory Networks. NIPS 2015
In this experiment, we set varying memory size to evaluate the efficiency of our model.
EMR
QA
model
(MemN2N)
Sentence 1
Sentence 2
Query M
External Memory
Sentence 1
Sentence 2
Write
Sentence N-13
Sentence N
Streaming Data
Read
………
Sentence 2
Sentence 2
“Answer”
Query 1
38. Result (Accuracy)
38
Both of our models (EMR-biGRU and EMR-Transformer) outperform the baselines.
Our methods are able to retain the supporting facts even with small number of
memory entries.
Original Noisy
39. Result (Solvable)
39
Also, we report how many supporting facts the models retain in the external memory.
Two EMR variants significantly outperform EMR-Independent as well as rule-based
memory scheduling policies.
Original Noisy
40. Chunk 3 (20 words)
Chunk 2 (20 words)
Experiment on TriviaQA Dataset
40
We combine our model with BERT [Devlin18] on TriviaQA dataset.
[Devlin18] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019
We extract the indices of span prediction since TriviaQA does not provide it.
We embed 20 words into each memory entry and hold 400 words at maximum.
Wikipedia
Document
Question
Chunk 1 (20 words)
Query
Chunk 16 (20 words)
Chunk N (20 words)
Streaming Data
……
EMR
QA
model
(BERT)
External MemoryWrite
Read
“Answer”
Chunk 3
Chunk N
Chunk 7
Chunk 17
Chunk N
…
41. Result (Accuracy)
41
Both our models (EMR-biGRU and EMR-Transformer) outperform the baselines.
Model ExactMatch F1score
FIFO 24.53 27.22
Uniform 28.30 34.39
LIFO 46.23 50.10
EMR-Independent 38.05 41.15
EMR-biGRU 52.20 57.57
EMR-Transformer 48.43 53.81
LIFO performs quite well unlike the other rule-based scheduling policies because
most answers are spanned in earlier part of the documents.
Result of TriviaQA Indices of answers following doc. length
42. Experiment on TVQA Dataset
42
Multi-stream model from [Lei18] is used as QA model with ours.
[Lei18] J. Lei, L. Yu, M. Bansal, T. L. Berg, TVQA: Localized, Compositional Video Question Answering. EMNLP 2018
A subtitle is attached to the frame which has the start of the subtitle.
One frame and the corresponding subtitle are jointly embedded into each memory
entry.
Frame | No Sub. 3
Frame | Subtitle 2
Video Clip
Question
Frame | Subtitle 1
Query
Frame | Subtitle 4
Frame | SubtitleN
Streaming Data
…
EMR
QA model
(Multi-stream)
External MemoryWrite
Read
“Answer”
Frame | Subtitle 2
Frame | SubtitleN
Frame | Subtitle 5
Frame | Subtitle9
Frame | No Sub. 5
…
43. Result (Accuracy)
43
Both our models (EMR-biGRU and EMR-Transformer) outperform the baselines.
Our methods are able to retain the frames and subtitles even with small number of
memory entries.
44. Example of TVQA Result
44
[Question] < 00:55.00 ~ 01:06.33 >
Who enters the coffee shop after Ross shows everyone the paper?
[Answer]
1) Joey 2) Rachel 3) Monica 4) Chandler 5) Phoebe
[Video Clip]
00:01 00:02 00:03 1:32 1:33
…
[Subtitle]
00:03 UNKNAME: Hey. I got some bad news. What?
00:05 UNKNAME: That’s no way to sell newspapers ...
(Ellipsis)
01:31 UNKNAME: Your food is abysmal!
[Subtitle]
𝑚0 UNKNAME: No. Monica’s restaurant got a horrible review ...
𝑚1 UNKNAME: I didn’t want her to see it, so I ran around and ...
𝑚2 Joey: This is bad. And I’ve had bad reviews.
[Memory Information after Reading Streaming Data]
𝑚3 Monica: Oh, my God! Look at all the newspapers.
𝑚4 UNKNAME: They say there’s no such thing as …
𝑚0 𝑚1 𝑚2 𝒎 𝟑 𝑚4
45. Visualization of Memory Entries
45
EMR learns general importance by storing the information corresponding to solving
unknown queries in the external memory.
46. Conclusion
46
• We propose a novel task of learning to remember important instances from
streaming data and show it on the question answering task.
Codes available at https://github.com/h19920918/emr
• Episodic Memory Reader (EMR) learns general importance by considering
relative importance between memory entries without knowing queries in order
to maximize the performance of the QA task.
• Results show that our models retain the information for answering even with
small number of memory entries relative to the length of streams.
• We believe that our work can be an essential part toward building real-world
conversation agent.