SlideShare a Scribd company logo
Akshay Sehgal (www.akshaysehgal.com)
LSTM
Long Short Term Memory
Akshay Sehgal, Lead Data Scientist @ Reliance Industries
Akshay Sehgal (www.akshaysehgal.com)
Pre-requisites

• Neural Networks using Keras

• Forward pass & computation graphs

• Back propagation

• Basics of RNN

• Activation functions
Akshay Sehgal (www.akshaysehgal.com)
How to handle sequence data?
• Text, Stock prices, Sensor signals, DNA, Customer purchase behaviour, Sound signals

• Bag of words doesn’t preserve order/sequence in data 

• Modelling sequential data requires a ‘temporal’ architecture to simulate ‘memory’

• The attempt is to encode a sequence into itself in an iterative manner (recurrent) over a ‘time step’

• Applications include predictive models, natural language understanding, POS tagging, Machine
translation, natural language generation etc.
Akshay Sehgal (www.akshaysehgal.com)
An RNN (Recurrent Neural Network) can be seen as a
layer in a neural network used for encoding sequential
data into a vector representation that can then be used
for various tasks such as classification or just as an
encoding. In other words, it's a method to perform
feature engineering in an automated way for sequential
data.
What is an RNN?
What time is ?
Akshay Sehgal (www.akshaysehgal.com)
• Long-term dependencies not captured, as
the number of time steps increase, the RNN
is unable to connect information

• Vanishing gradient problem causes loss of
long term memory, while emphasising short
term.
Why don’t RNNs work in practice?
Akshay Sehgal (www.akshaysehgal.com)
• LSTMs try to add long term memory to remember certain hidden states more than others. This allows
them to retain knowledge over longer sequences.

• They have 2 outputs instead of 1, the hidden state and the cell state. Their computation is a bit more
complex than RNNs
How do LSTMs work?
RNN Chain
LSTM Chain
Akshay Sehgal (www.akshaysehgal.com)
• An LSTMs architecture consists of 3 gates - Forget
gate, Input gate, Output gate

• Tanh acts as a squashing function while Sigmoid
acts as a decision function (gate)

• Cell state is a channel that runs along the LSTM
chain carrying information from one time-step to
another freely
LSTM cell architecture
Akshay Sehgal (www.akshaysehgal.com)
A cell state is a conveyor belt that can carry information
from one time step to another. The three gates add
information to the cell state. Whether to add information
or not is dependent on the Sigmoid function. 0 means
add no information, 1 means add complete information.
The Cell state
Akshay Sehgal (www.akshaysehgal.com)
Let's say that the previous few time steps encode the
information about the gender of the subject. This is useful to
predict the next few words when the subject is the same.
But when a new subject enters, we would not want to retain
memory of the information about gender. This is what the
forget gate gets trained to do.

It concatenates the previous hidden state to the current
input, multiplies it with weights and adds a bias, then applies
a sigmoid function before multiplying it to the cell state.
The Forget Gate
Akshay Sehgal (www.akshaysehgal.com)
Input gate decides what information needs to be saved to the cell state. It simply does the same operation
as a forget gate but instead of writing it onto the cell state, it combines (multiplies) it with the Tanh
(squashed) of the concatenated vector of hidden state and input (plus bias). This is then added to the cell
state, which has been updated by the forget gate already.
The Input Gate
Akshay Sehgal (www.akshaysehgal.com)
Finally, we decide what is the output of the LSTM
cell (other than the cell state, which becomes the
hidden state for the next LSTM cell). This is done
simply by applying a sigmoid function on the
concatenation of the previous hidden state and
current input. But we then multiply it with the
squashed (tanh) version of the cell state which
contains what to remember and what to forget.
The Output Gate
Akshay Sehgal (www.akshaysehgal.com)
Using LSTMs as an encoder and decoder for
machine translation or Question-Answering bot.
Machine Translation
Akshay Sehgal (www.akshaysehgal.com)
Reading Material
• https://arxiv.org/pdf/1506.00019.pdf
• https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
• http://www.bioinf.jku.at/publications/older/2604.pdf
• https://github.com/oxford-cs-deepnlp-2017/lectures

More Related Content

What's hot

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
Olusola Amusan
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
Tunde Ajose-Ismail
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
EmmanuelJosterSsenjo
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
Khang Pham
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
omaraldabash
 
Neural network
Neural networkNeural network
Neural network
Ramesh Giri
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 

What's hot (20)

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Neural network
Neural networkNeural network
Neural network
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep learning
Deep learningDeep learning
Deep learning
 

Similar to LSTM Basics

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Sharath TS
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
Shubhangi Tandon
 
Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
Abdullah al Mamun
 
Synchronicity of a distributed account system
Synchronicity of a distributed account systemSynchronicity of a distributed account system
Synchronicity of a distributed account system
Luis Caldeira
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
 
Building stateful systems with akka cluster sharding
Building stateful systems with akka cluster shardingBuilding stateful systems with akka cluster sharding
Building stateful systems with akka cluster sharding
Knoldus Inc.
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
Dean Hamstead
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
Martin Odersky
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
srisatish ambati
 
Impromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and otherImpromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and other
Harshit Srivastava
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Fordham University
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
Yaroslav Tkachenko
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
JAXLondon2014
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
Discovering the Service Fabric's actor model
Discovering the Service Fabric's actor modelDiscovering the Service Fabric's actor model
Discovering the Service Fabric's actor model
Massimo Bonanni
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 

Similar to LSTM Basics (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
 
Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
 
Synchronicity of a distributed account system
Synchronicity of a distributed account systemSynchronicity of a distributed account system
Synchronicity of a distributed account system
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 
Building stateful systems with akka cluster sharding
Building stateful systems with akka cluster shardingBuilding stateful systems with akka cluster sharding
Building stateful systems with akka cluster sharding
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 
Impromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and otherImpromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and other
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Discovering the Service Fabric's actor model
Discovering the Service Fabric's actor modelDiscovering the Service Fabric's actor model
Discovering the Service Fabric's actor model
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 

Recently uploaded

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

LSTM Basics

  • 1. Akshay Sehgal (www.akshaysehgal.com) LSTM Long Short Term Memory Akshay Sehgal, Lead Data Scientist @ Reliance Industries
  • 2. Akshay Sehgal (www.akshaysehgal.com) Pre-requisites • Neural Networks using Keras • Forward pass & computation graphs • Back propagation • Basics of RNN • Activation functions
  • 3. Akshay Sehgal (www.akshaysehgal.com) How to handle sequence data? • Text, Stock prices, Sensor signals, DNA, Customer purchase behaviour, Sound signals • Bag of words doesn’t preserve order/sequence in data • Modelling sequential data requires a ‘temporal’ architecture to simulate ‘memory’ • The attempt is to encode a sequence into itself in an iterative manner (recurrent) over a ‘time step’ • Applications include predictive models, natural language understanding, POS tagging, Machine translation, natural language generation etc.
  • 4. Akshay Sehgal (www.akshaysehgal.com) An RNN (Recurrent Neural Network) can be seen as a layer in a neural network used for encoding sequential data into a vector representation that can then be used for various tasks such as classification or just as an encoding. In other words, it's a method to perform feature engineering in an automated way for sequential data. What is an RNN? What time is ?
  • 5. Akshay Sehgal (www.akshaysehgal.com) • Long-term dependencies not captured, as the number of time steps increase, the RNN is unable to connect information • Vanishing gradient problem causes loss of long term memory, while emphasising short term. Why don’t RNNs work in practice?
  • 6. Akshay Sehgal (www.akshaysehgal.com) • LSTMs try to add long term memory to remember certain hidden states more than others. This allows them to retain knowledge over longer sequences. • They have 2 outputs instead of 1, the hidden state and the cell state. Their computation is a bit more complex than RNNs How do LSTMs work? RNN Chain LSTM Chain
  • 7. Akshay Sehgal (www.akshaysehgal.com) • An LSTMs architecture consists of 3 gates - Forget gate, Input gate, Output gate • Tanh acts as a squashing function while Sigmoid acts as a decision function (gate) • Cell state is a channel that runs along the LSTM chain carrying information from one time-step to another freely LSTM cell architecture
  • 8. Akshay Sehgal (www.akshaysehgal.com) A cell state is a conveyor belt that can carry information from one time step to another. The three gates add information to the cell state. Whether to add information or not is dependent on the Sigmoid function. 0 means add no information, 1 means add complete information. The Cell state
  • 9. Akshay Sehgal (www.akshaysehgal.com) Let's say that the previous few time steps encode the information about the gender of the subject. This is useful to predict the next few words when the subject is the same. But when a new subject enters, we would not want to retain memory of the information about gender. This is what the forget gate gets trained to do. It concatenates the previous hidden state to the current input, multiplies it with weights and adds a bias, then applies a sigmoid function before multiplying it to the cell state. The Forget Gate
  • 10. Akshay Sehgal (www.akshaysehgal.com) Input gate decides what information needs to be saved to the cell state. It simply does the same operation as a forget gate but instead of writing it onto the cell state, it combines (multiplies) it with the Tanh (squashed) of the concatenated vector of hidden state and input (plus bias). This is then added to the cell state, which has been updated by the forget gate already. The Input Gate
  • 11. Akshay Sehgal (www.akshaysehgal.com) Finally, we decide what is the output of the LSTM cell (other than the cell state, which becomes the hidden state for the next LSTM cell). This is done simply by applying a sigmoid function on the concatenation of the previous hidden state and current input. But we then multiply it with the squashed (tanh) version of the cell state which contains what to remember and what to forget. The Output Gate
  • 12. Akshay Sehgal (www.akshaysehgal.com) Using LSTMs as an encoder and decoder for machine translation or Question-Answering bot. Machine Translation
  • 13. Akshay Sehgal (www.akshaysehgal.com) Reading Material • https://arxiv.org/pdf/1506.00019.pdf • https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/ • http://www.bioinf.jku.at/publications/older/2604.pdf • https://github.com/oxford-cs-deepnlp-2017/lectures