SlideShare a Scribd company logo
1 of 26
A presentation on
End to End Memory Networks (MemN2N)
Slides: 26
Time: 15 minutes
IE 594 Data Science 2
University of Illinois at Chicago, February 2017
Under the guidance of,
Prof. Dr. Ashkan Sharabiani
By,
Ashish Menkudale
The kitchen is north of the hallway.
The bathroom is west of the bedroom.
The den is east of the hallway.
The office is south of the bedroom.
How do you go from den to kitchen?
2
The kitchen is north of the hallway.
The bathroom is west of the bedroom.
The den is east of the hallway.
The office is south of the bedroom.
How do you go from den to kitchen?
Kitchen
Hallway
Bathroom Bedroom
Den
Office
West, North.
West
North
3
Brian is frog.
Lily is grey.
Brian is yellow.
Julius is green.
What color is Greg?
Greg is frog.
4
Brian is frog.
Lily is grey.
Brian is yellow.
Julius is green.
What color is Greg?
Greg is frog.
Yellow.
5
External Global Memory
Memory
Module
Controller
Module
Output
Read
Write
Input
Dedicated
separate
memory
module.
Memory
can be
stack or
list/set of
vectors.
Control
module
accesses
memory
(read, write).
Advantage: stable, scalable.
Charles Babbage
Invented
analytical engine
Concept.
1791-1871
Konrad Zuse
Invented
stored-program
concept.
1910-1995
6
Warren Sturgis
McCulloch.
Computational
model for neural
networks
1898-1969
Memory Networks
• Memory network with large external memory.
required for low level tasks like object recognition.
• Writes everything to the memory, but reads only relative information.
• Attempts to add long term memory component to make it more like artificial intelligence.
• Two types:
• Strongly supervised memory network: Hard addressing.
• Weekly supervised memory network: Soft addressing.
• Hard addressing: max of the inner product between the internal state and memory contents.
Mary is in garden.
John is in office. Q: Where is John?
Bob is in kitchen.
Walter Pitts.
Computational
model for
neural networks
1923-1969
7
Memory Vectors
Example: Constructing memory vectors with bag of words (BoW)
Embed each word
Sum embedding vectors
“Sam drops apple” 𝑉 𝑆𝑎𝑚 + 𝑉 𝑑𝑟𝑜𝑝𝑠 + 𝑉apple = 𝑚
Embedding vectors
memory vector
Example: Temporal structure – special words for time and include them in bag of words.
1. Sam moved to garden
2. Sam moved to kitchen.
3. Sam drops apple.
𝑉 𝑆𝑎𝑚 + 𝑉 𝑑𝑟𝑜𝑝𝑠 + 𝑉apple + 𝑉time = 𝑚
Time EmbeddingTime Stamp
8
Bob is in kitchen. Mary is in garden. John is in office. Where is John?
Embed Embed Embed Embed
X X X
Max
Internal State vector
John is in office
Embed +
Decoder
Office
Output
Memory Controller
Memory Networks
Input
9
Issues with Memory Network
• Requires explicit supervision of attention during training.
Need to say which memory the model should use.
• Need a model that just requires supervision at output.
No supervision of attention required.
• Only feasible for simple tasks.
Severely limits application of model.
10
End-To-End Memory Networks
• Soft attention version of MemN2N.
• Flexible read-only memory.
• Multiple memory lookups (hops).
• Can consider multiple memory before deciding output.
• More reasoning power.
• End-to-end training.
• Only needs final output for training.
• Simple back-propagation.
11
Sainbayar
Sukhbaatar
Arthur
Szlam
Jason
Weston
Rob
Fergus
Tanh / ReLU
Dot product
Softmax
Weighted Sum
Memory
content
Sum
Linear
State
State
Memory module
Output Target
Loss
Function
Input
Controller
module
E.g. RNN
MemN2N architecture
12
MemN2N in action : Single memory lookup
Sentences {Xi}
Softmax
Question q
Embedding BInner product
Embedding A
Embedding C
Probability
Weighted Sum
∑
InputsWeightsOutput
O
u
Weight
Softmax
Predicted
Answer
Mary is in garden.
John is in office.
Bob is in kitchen.
Where is John?
Office
Training: estimate embedding matrices A, B & C and output matrix W.
13
A3
C3
A2
C2
C1
Multiple Memory Lookups: Multiple Hops
Sentences {Xi}
Input 1
Output 1
∑
Question q
Input 2
Output 2
∑
Input 3
Output 3
∑
A1
O1
O1
O3
W 𝐴
Predicted Answer
u1
u2
u3
14
Components
15
I (Input): No conversion keep original text X.
G (Generalization): Stores I (X) in next available memory slot.
O (Output): Loops over all memories.
Find best match of 𝑚i with X.
Find best match of 𝑚j with (𝑚i , X)
Can be extended to multiple number of hops.
R (Response) : Ranks all words in dictionary given o and returns best single word.
Infact, RNN can be used here for better sentence correction.
Weight Tying
16
Weight tying : Indicates how weight vectors are multiplied to input and output
component.
Two Methods:
Adjacent:
Similar to stack layers
Output embedding of one layer are input embedding of the next layer.
Layer wise:
Input embedding remains the same for every layer in architecture.
Scoring function
17
Question : Answers are mapped to story using word embedding.
Word Embedding : Maps different words in low dimensional vector space with advantage to
calculate distance between word vectors.
Allow us to find similarity score between different sentence to understand
maximum correlation between them.
Match (‘Where is football?’, ‘John picked up the football’).
qTUTUd : This model is default word embedding used in memory networks.
q – Question.
U – matrix by which word embedding are obtained.
d – Answer.
Model Selection
18
Model Selection: Determines how to model story, questions and answer vectors for
word embedding.
Two possible approach:
Bag of words model:
Considers each word in a sentence.
Embeds each word and sums resulting vector.
Does not take into account context for each word.
Position Encoding:
Considers position/context of sentence/words.
Takes care of preceding and forwarding words.
Maps it to low dimensional vector space.
Model Refining
Addition of noise.
increasing training dataset.
Decisions for Configuration
19
• Number of hops
• Number of epochs
• Embedding size
• Training dataset
• Validation dataset
• Model selection
• Weight tying
RNN viewpoint of MemN2N
Plain RNN Memory Network
RNN
Input Sequence
Memory
RNN
All Input
Selected input Addressing signal
Inputs are fed to RNN one-by-one in order. RNN has only one
chance to look at a certain input symbol.
Place all inputs in the memory. Let the model decide which
part it reads next.
20
• More generic input format
• Any set of vectors can be input
• Each vector can be
o BOW of symbols (including location)
o Image feature + feature position
• Location can be 1D, 2D, …
• Variable size
Advantages of MemN2N over RNN
• Out-of-order access to input data
• Less distracted by unimportant inputs
• Longer term memorization
• No vanishing or exploding gradient problems
21
bAbi Project: Task CAtegories
Training dataset: 1000 questions for each tasks. Testing dataset: 1000 questions for each tasks.
23
Demo for bAbi tasks
24
bAbi Project: Benchmark results
1. GitHub project archives: https://github.com/vinhkhuc/MemN2N-babi-python
2. https://www.msri.org/workshops/796/schedules/20462/documents/2704/assets/24734
3. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes: https://arxiv.org/pdf/1607.00036.pdf
4. bAbi answers: https://arxiv.org/pdf/1502.05698.pdf
5. Memory Networks by Microsoft research: https://www.youtube.com/watch?v=ZwvWY9Yy76Q&t=1s
6. Memory Networks (Jenil Shah): https://www.youtube.com/watch?v=BN7Kp0JD04o
7. N gram – SVM – generative models difference. http://stackoverflow.com/questions/20315897/n-grams-vs-other-
classifiers-in-text-categorization
8. Paper on results for bAbi tasks by Facebook AI team: https://papers.nips.cc/paper/5846-end-to-end-memory-
networks.pdf
9. Towards AI-complete question answering : a set of prerequisite toy tasks https://arxiv.org/pdf/1502.05698.pdf
25
References
26
Questions

More Related Content

What's hot

Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture noteSusang Kim
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Chat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlowChat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlowJeongkyu Shin
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVERNAVER D2
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep LearningAdam Gibson
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text MiningWill Stanton
 
머신러닝 시그 세미나_(deep learning for visual recognition)
머신러닝 시그 세미나_(deep learning for visual recognition)머신러닝 시그 세미나_(deep learning for visual recognition)
머신러닝 시그 세미나_(deep learning for visual recognition)Yonghoon Kwon
 
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...Edureka!
 

What's hot (12)

Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
#02 Next RNN
#02 Next RNN#02 Next RNN
#02 Next RNN
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Chat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlowChat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlow
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
머신러닝 시그 세미나_(deep learning for visual recognition)
머신러닝 시그 세미나_(deep learning for visual recognition)머신러닝 시그 세미나_(deep learning for visual recognition)
머신러닝 시그 세미나_(deep learning for visual recognition)
 
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
 

Viewers also liked

Garlic Seed Foundation Pres
Garlic Seed Foundation PresGarlic Seed Foundation Pres
Garlic Seed Foundation Presminiarra
 
Apple Daily Animation Business Analysis
Apple Daily Animation Business AnalysisApple Daily Animation Business Analysis
Apple Daily Animation Business AnalysisWenbin Zhao
 
Purple garlic. presentation.
Purple garlic. presentation.Purple garlic. presentation.
Purple garlic. presentation.lmjf92
 
Diagram peel the onion power point presentation slides
Diagram peel  the onion power point presentation slides Diagram peel  the onion power point presentation slides
Diagram peel the onion power point presentation slides SlideTeam.net
 
Blanca nieves
Blanca nievesBlanca nieves
Blanca nieveslilita68
 
Sandeep kulkarni architect presentation
Sandeep kulkarni architect presentationSandeep kulkarni architect presentation
Sandeep kulkarni architect presentationmumbaiarchitect2016
 
Peel The Onion
Peel The OnionPeel The Onion
Peel The Onionmpmeier
 
Peel the onion powerpoint ppt slides.
Peel the onion powerpoint ppt slides.Peel the onion powerpoint ppt slides.
Peel the onion powerpoint ppt slides.SlideTeam.net
 
Keen IO's Community Commitment Curve + Community Onion
Keen IO's Community Commitment Curve + Community OnionKeen IO's Community Commitment Curve + Community Onion
Keen IO's Community Commitment Curve + Community OnionTim Falls
 
Slide of procedure text
Slide of procedure textSlide of procedure text
Slide of procedure textAgnes Kasih
 
New Intro to Architecture Week 4
New Intro to Architecture Week 4New Intro to Architecture Week 4
New Intro to Architecture Week 4Hamdija Velagic
 
Golden Ratio in Architecture
Golden Ratio in ArchitectureGolden Ratio in Architecture
Golden Ratio in Architectureguest1e1cf87
 
Structure And Plates
Structure And PlatesStructure And Plates
Structure And PlatesSHS Geog
 
Golden ratio
Golden ratioGolden ratio
Golden ratiobshreya62
 

Viewers also liked (20)

Garlic Seed Foundation Pres
Garlic Seed Foundation PresGarlic Seed Foundation Pres
Garlic Seed Foundation Pres
 
The Golden Ratio
The Golden RatioThe Golden Ratio
The Golden Ratio
 
Apple Daily Animation Business Analysis
Apple Daily Animation Business AnalysisApple Daily Animation Business Analysis
Apple Daily Animation Business Analysis
 
Purple garlic. presentation.
Purple garlic. presentation.Purple garlic. presentation.
Purple garlic. presentation.
 
Diagram peel the onion power point presentation slides
Diagram peel  the onion power point presentation slides Diagram peel  the onion power point presentation slides
Diagram peel the onion power point presentation slides
 
Blanca nieves
Blanca nievesBlanca nieves
Blanca nieves
 
Sandeep kulkarni architect presentation
Sandeep kulkarni architect presentationSandeep kulkarni architect presentation
Sandeep kulkarni architect presentation
 
Peel The Onion
Peel The OnionPeel The Onion
Peel The Onion
 
Peel the onion powerpoint ppt slides.
Peel the onion powerpoint ppt slides.Peel the onion powerpoint ppt slides.
Peel the onion powerpoint ppt slides.
 
Keen IO's Community Commitment Curve + Community Onion
Keen IO's Community Commitment Curve + Community OnionKeen IO's Community Commitment Curve + Community Onion
Keen IO's Community Commitment Curve + Community Onion
 
Text procedure
Text procedureText procedure
Text procedure
 
Slide of procedure text
Slide of procedure textSlide of procedure text
Slide of procedure text
 
Inside the earth
Inside the earthInside the earth
Inside the earth
 
Gr10 u4 printmaking
Gr10 u4 printmakingGr10 u4 printmaking
Gr10 u4 printmaking
 
golden section
golden sectiongolden section
golden section
 
New Intro to Architecture Week 4
New Intro to Architecture Week 4New Intro to Architecture Week 4
New Intro to Architecture Week 4
 
Golden Ratio in Architecture
Golden Ratio in ArchitectureGolden Ratio in Architecture
Golden Ratio in Architecture
 
Structure And Plates
Structure And PlatesStructure And Plates
Structure And Plates
 
Golden ratio
Golden ratioGolden ratio
Golden ratio
 
Golden section
Golden sectionGolden section
Golden section
 

Similar to MemN2N: End-to-End Memory Networks

Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdfChaoYang81
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machinesNAVER D2
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problemJaeHo Jang
 
Natural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachNatural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachMinhazul Arefin
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?NAVER Engineering
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 

Similar to MemN2N: End-to-End Memory Networks (20)

Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machines
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Natural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachNatural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning Approach
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Life Is Great
Life Is GreatLife Is Great
Life Is Great
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
 
Unit v transfer learning
Unit v transfer  learningUnit v transfer  learning
Unit v transfer learning
 

More from ASHISH MENKUDALE

Breast cancerdetection IE594 Project Report
Breast cancerdetection IE594 Project ReportBreast cancerdetection IE594 Project Report
Breast cancerdetection IE594 Project ReportASHISH MENKUDALE
 
Facility Layout and operations efficiency optimization.
Facility Layout and operations efficiency optimization.Facility Layout and operations efficiency optimization.
Facility Layout and operations efficiency optimization.ASHISH MENKUDALE
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.ASHISH MENKUDALE
 
Design improvements and costing analysis
Design improvements and costing analysisDesign improvements and costing analysis
Design improvements and costing analysisASHISH MENKUDALE
 

More from ASHISH MENKUDALE (7)

Breast cancerdetection IE594 Project Report
Breast cancerdetection IE594 Project ReportBreast cancerdetection IE594 Project Report
Breast cancerdetection IE594 Project Report
 
Facility Layout and operations efficiency optimization.
Facility Layout and operations efficiency optimization.Facility Layout and operations efficiency optimization.
Facility Layout and operations efficiency optimization.
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.
 
Factorial Design analysis
Factorial Design analysisFactorial Design analysis
Factorial Design analysis
 
Cummins
CumminsCummins
Cummins
 
Basics of airplanes
Basics of airplanesBasics of airplanes
Basics of airplanes
 
Design improvements and costing analysis
Design improvements and costing analysisDesign improvements and costing analysis
Design improvements and costing analysis
 

Recently uploaded

lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Recently uploaded (20)

lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 

MemN2N: End-to-End Memory Networks

  • 1. A presentation on End to End Memory Networks (MemN2N) Slides: 26 Time: 15 minutes IE 594 Data Science 2 University of Illinois at Chicago, February 2017 Under the guidance of, Prof. Dr. Ashkan Sharabiani By, Ashish Menkudale
  • 2. The kitchen is north of the hallway. The bathroom is west of the bedroom. The den is east of the hallway. The office is south of the bedroom. How do you go from den to kitchen? 2
  • 3. The kitchen is north of the hallway. The bathroom is west of the bedroom. The den is east of the hallway. The office is south of the bedroom. How do you go from den to kitchen? Kitchen Hallway Bathroom Bedroom Den Office West, North. West North 3
  • 4. Brian is frog. Lily is grey. Brian is yellow. Julius is green. What color is Greg? Greg is frog. 4
  • 5. Brian is frog. Lily is grey. Brian is yellow. Julius is green. What color is Greg? Greg is frog. Yellow. 5
  • 6. External Global Memory Memory Module Controller Module Output Read Write Input Dedicated separate memory module. Memory can be stack or list/set of vectors. Control module accesses memory (read, write). Advantage: stable, scalable. Charles Babbage Invented analytical engine Concept. 1791-1871 Konrad Zuse Invented stored-program concept. 1910-1995 6
  • 7. Warren Sturgis McCulloch. Computational model for neural networks 1898-1969 Memory Networks • Memory network with large external memory. required for low level tasks like object recognition. • Writes everything to the memory, but reads only relative information. • Attempts to add long term memory component to make it more like artificial intelligence. • Two types: • Strongly supervised memory network: Hard addressing. • Weekly supervised memory network: Soft addressing. • Hard addressing: max of the inner product between the internal state and memory contents. Mary is in garden. John is in office. Q: Where is John? Bob is in kitchen. Walter Pitts. Computational model for neural networks 1923-1969 7
  • 8. Memory Vectors Example: Constructing memory vectors with bag of words (BoW) Embed each word Sum embedding vectors “Sam drops apple” 𝑉 𝑆𝑎𝑚 + 𝑉 𝑑𝑟𝑜𝑝𝑠 + 𝑉apple = 𝑚 Embedding vectors memory vector Example: Temporal structure – special words for time and include them in bag of words. 1. Sam moved to garden 2. Sam moved to kitchen. 3. Sam drops apple. 𝑉 𝑆𝑎𝑚 + 𝑉 𝑑𝑟𝑜𝑝𝑠 + 𝑉apple + 𝑉time = 𝑚 Time EmbeddingTime Stamp 8
  • 9. Bob is in kitchen. Mary is in garden. John is in office. Where is John? Embed Embed Embed Embed X X X Max Internal State vector John is in office Embed + Decoder Office Output Memory Controller Memory Networks Input 9
  • 10. Issues with Memory Network • Requires explicit supervision of attention during training. Need to say which memory the model should use. • Need a model that just requires supervision at output. No supervision of attention required. • Only feasible for simple tasks. Severely limits application of model. 10
  • 11. End-To-End Memory Networks • Soft attention version of MemN2N. • Flexible read-only memory. • Multiple memory lookups (hops). • Can consider multiple memory before deciding output. • More reasoning power. • End-to-end training. • Only needs final output for training. • Simple back-propagation. 11 Sainbayar Sukhbaatar Arthur Szlam Jason Weston Rob Fergus
  • 12. Tanh / ReLU Dot product Softmax Weighted Sum Memory content Sum Linear State State Memory module Output Target Loss Function Input Controller module E.g. RNN MemN2N architecture 12
  • 13. MemN2N in action : Single memory lookup Sentences {Xi} Softmax Question q Embedding BInner product Embedding A Embedding C Probability Weighted Sum ∑ InputsWeightsOutput O u Weight Softmax Predicted Answer Mary is in garden. John is in office. Bob is in kitchen. Where is John? Office Training: estimate embedding matrices A, B & C and output matrix W. 13
  • 14. A3 C3 A2 C2 C1 Multiple Memory Lookups: Multiple Hops Sentences {Xi} Input 1 Output 1 ∑ Question q Input 2 Output 2 ∑ Input 3 Output 3 ∑ A1 O1 O1 O3 W 𝐴 Predicted Answer u1 u2 u3 14
  • 15. Components 15 I (Input): No conversion keep original text X. G (Generalization): Stores I (X) in next available memory slot. O (Output): Loops over all memories. Find best match of 𝑚i with X. Find best match of 𝑚j with (𝑚i , X) Can be extended to multiple number of hops. R (Response) : Ranks all words in dictionary given o and returns best single word. Infact, RNN can be used here for better sentence correction.
  • 16. Weight Tying 16 Weight tying : Indicates how weight vectors are multiplied to input and output component. Two Methods: Adjacent: Similar to stack layers Output embedding of one layer are input embedding of the next layer. Layer wise: Input embedding remains the same for every layer in architecture.
  • 17. Scoring function 17 Question : Answers are mapped to story using word embedding. Word Embedding : Maps different words in low dimensional vector space with advantage to calculate distance between word vectors. Allow us to find similarity score between different sentence to understand maximum correlation between them. Match (‘Where is football?’, ‘John picked up the football’). qTUTUd : This model is default word embedding used in memory networks. q – Question. U – matrix by which word embedding are obtained. d – Answer.
  • 18. Model Selection 18 Model Selection: Determines how to model story, questions and answer vectors for word embedding. Two possible approach: Bag of words model: Considers each word in a sentence. Embeds each word and sums resulting vector. Does not take into account context for each word. Position Encoding: Considers position/context of sentence/words. Takes care of preceding and forwarding words. Maps it to low dimensional vector space. Model Refining Addition of noise. increasing training dataset.
  • 19. Decisions for Configuration 19 • Number of hops • Number of epochs • Embedding size • Training dataset • Validation dataset • Model selection • Weight tying
  • 20. RNN viewpoint of MemN2N Plain RNN Memory Network RNN Input Sequence Memory RNN All Input Selected input Addressing signal Inputs are fed to RNN one-by-one in order. RNN has only one chance to look at a certain input symbol. Place all inputs in the memory. Let the model decide which part it reads next. 20
  • 21. • More generic input format • Any set of vectors can be input • Each vector can be o BOW of symbols (including location) o Image feature + feature position • Location can be 1D, 2D, … • Variable size Advantages of MemN2N over RNN • Out-of-order access to input data • Less distracted by unimportant inputs • Longer term memorization • No vanishing or exploding gradient problems 21
  • 22. bAbi Project: Task CAtegories Training dataset: 1000 questions for each tasks. Testing dataset: 1000 questions for each tasks.
  • 25. 1. GitHub project archives: https://github.com/vinhkhuc/MemN2N-babi-python 2. https://www.msri.org/workshops/796/schedules/20462/documents/2704/assets/24734 3. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes: https://arxiv.org/pdf/1607.00036.pdf 4. bAbi answers: https://arxiv.org/pdf/1502.05698.pdf 5. Memory Networks by Microsoft research: https://www.youtube.com/watch?v=ZwvWY9Yy76Q&t=1s 6. Memory Networks (Jenil Shah): https://www.youtube.com/watch?v=BN7Kp0JD04o 7. N gram – SVM – generative models difference. http://stackoverflow.com/questions/20315897/n-grams-vs-other- classifiers-in-text-categorization 8. Paper on results for bAbi tasks by Facebook AI team: https://papers.nips.cc/paper/5846-end-to-end-memory- networks.pdf 9. Towards AI-complete question answering : a set of prerequisite toy tasks https://arxiv.org/pdf/1502.05698.pdf 25 References