SlideShare a Scribd company logo
1 of 17
LSTM (Long short-term memory)
Team 19
Olu Amusan
Ryan Moye
YingHsuan Lo
Table of Contents
1. History
3
2. Architecture
5
3. Variants
8
4. Vanishing Gradient Problem
10
5. Training in Supervised Fashion
11
6. CTC Score Function
12
History
1995-1997:
LSTM was
proposed by
Sepp Hochreiter
and Jürgen
Schmidhuber
showing that
LSTM solves the
vanishing
gradient
problem.
1999-2000:
Felix Gers,
introduced the
forget gate
Gers &
Schmidhuber &
Cummins
added
peephole
connections
2009: An LSTM
based model
won the ICDAR
connected
handwriting
recognition
competition
2013: LSTM
networks
records 17.7%
phoneme error
rate on the
classic TIMIT
natural speech
dataset
2014-2016:
Kyunghyun Cho et
al. Created Gated
recurrent unit (GRU)
and Google started
using an LSTM for
speech recognition
on Google Voice and
Allo conversation
app
2017: Facebook
performed some
4.5 billion
automatic
translations daily
with LSTM
Microsoft
reported
reaching 94.9%
recognition
accuracy
2019: A new
RNN derived
using the
Legendre
polynomials and
outperforms the
LSTM on some
memory-related
benchmarks.
An LSTM model
climbed to third
place on the in
Large Text
Compression
Benchmark
Architecture
Common Architecture of an LSTM Model
The Long Short-Term Memory (LSTM) Model architecture is based on Recurrent Neural
Network (RNN) architecture. One thing that sets the RNN/LSTM model apart from typical
Neural Nets is that LSTMs have feedback networks, instead of the common feedforward
network.
LSTMs are commonly composed of four parts, a cell, an input gate, output gate and forget
gate.
● The cell is the memory part of the LSTM unit and is responsible for keeping up with
the dependencies between the elements in the input sequence.
● The input gate controls how a new value comes into the cell.
● The output gate controls how the value in the cell is used to compute the output
activation of an LSTM unit.
● The forget gate controls whether a value stays in the cell.
● The weights of the connections, both into and out of the gates, determine how the
gate will operate.
Feedback Network
Feedforward Network
Activation Functions
Sigmoid Function:
Hyperbolic Tangent Function:
Variants
Variations on the LSTM Model
There are quite a few variations on the LSTM model, here are a few examples:
● Peephole LSTM:
○ A peephole LSTM has peephole connections which allow the gates to access
the constant error carousel (CEC) which has an activation in the cell state.
● Peephole Convolutional LSTM:
○ Similar to the peephole LSTM, this model adds convolutional layers making it
more efficient in processing image and video data.
● Gated Recurrent Units (GRUs):
○ This model follows the common architecture of an LSTM model, but does not
have an output gate.
● Multiplicative LSTM (mLSTM):
○ This is a complex model that has achieved state of the art achievements in
natural language processing.
○ OpenAI’s unsupervised sentiment neuron is based on mLSTMs.
Vanishing gradient issue in training RNN
1 2 3
Back
propagation
Chain rule Long sequence
biased to
capture
shorter-term
dependencies
Solution 1:choosing the right activation function
Solution 2:initialize our weights differently
Solution 3: gated cells, make each noted a more
complex unit with gates controlling what
information is passed through → LSTM
Time step Time step Time step
LSTM cell from figure 4.2 in “Supervised
Sequence Labelling with Recurrent Neural
Networks” by Alex Graves
● Optimization algorithm:
gradient descent combined
with backpropagation
● Adam,RMSprop,etc
● LSTM is a differentiable
function approximator that is
typically trained with gradient
descent. Recently, non
gradient-based training
methods of LSTM have also
been considered
● Widely used in speech
recognition due to its
simplicity in training and
efficiency in decoding.
● Rule:
Remove blank frames (null)
Merge repetition frames
Ex: ex ex ex → ex
● Issues:
Each output is decided
independently
Align input label with
corresponding output
Connectionist Temporal
Classification (CTC)
Paired training
data
X1,X2,X3,X4
W
Encoder -->LSTM
X1 X2 X3 X4 X5
h1 h2 h3 h4 h5
classifier
Token distribution
cross-entropy
=Softmax( hi )
Vocabulary size V
Successful examples in unsupervised training
Dota 2
Learning Dexterity
Each of OpenAI Five’s networks contain a
single-layer, 1024-unit LSTM that sees the
current game state (extracted from Valve’s Bot
API) and emits actions through several possible
action heads.
We represent the policy as a
recurrent neural network with
memory, namely an LSTM with an
additional hidden layer with ReLU
activations inserted between inputs
and the LSTM.
Demo of Learning Dexterity
Time series prediction CTC score function
Alternatives Sign language translation
Speech recognition Handwriting recognition
Drug design
Robot control Time series prediction
Rhythm learning Music
composition Grammar learning
Human action recognition
Protein homology detection
Predicting subcellular localization of proteins…..
There are much more to do with LSTM!
Resources
Main article: Long short-term memory
Supplemental resources:
● Learning Dexterous In-Hand Manipulation
● Open AI 5 v.s. Dora2
● END-TO-END SPEECH RECOGNITION USING A HIGH RANK LSTM-CTC BASED MODEL
● Connectionist Temporal Classification: Labelling Unsegmented Sequence
Data with Recurrent Neural Networks
● Supervised Sequence Labelling with Recurrent Neural Networks
Thanks for Listening!
Any Questions?

More Related Content

What's hot

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding GradientsSiddharth Vij
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM AlgorithmYasutoTamura1
 

What's hot (20)

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM Algorithm
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
 

Similar to Long Short Term Memory (Neural Networks)

Implement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket AnalysisImplement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket AnalysisKv Sagar
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-upHoàng Triều Trịnh
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxlwz614595250
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsAce12358
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerYun-Nung (Vivian) Chen
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
PR-043: HyperNetworks
PR-043: HyperNetworksPR-043: HyperNetworks
PR-043: HyperNetworksTaesu Kim
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance957671457
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
 
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...Databricks
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdfssuser849b73
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesYannis Flet-Berliac
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 

Similar to Long Short Term Memory (Neural Networks) (20)

Implement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket AnalysisImplement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket Analysis
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-up
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptx
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cws
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
PR-043: HyperNetworks
PR-043: HyperNetworksPR-043: HyperNetworks
PR-043: HyperNetworks
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
 
20090720 smith
20090720 smith20090720 smith
20090720 smith
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Long Short Term Memory (Neural Networks)

  • 1. LSTM (Long short-term memory) Team 19 Olu Amusan Ryan Moye YingHsuan Lo
  • 2. Table of Contents 1. History 3 2. Architecture 5 3. Variants 8 4. Vanishing Gradient Problem 10 5. Training in Supervised Fashion 11 6. CTC Score Function 12
  • 4. 1995-1997: LSTM was proposed by Sepp Hochreiter and Jürgen Schmidhuber showing that LSTM solves the vanishing gradient problem. 1999-2000: Felix Gers, introduced the forget gate Gers & Schmidhuber & Cummins added peephole connections 2009: An LSTM based model won the ICDAR connected handwriting recognition competition 2013: LSTM networks records 17.7% phoneme error rate on the classic TIMIT natural speech dataset 2014-2016: Kyunghyun Cho et al. Created Gated recurrent unit (GRU) and Google started using an LSTM for speech recognition on Google Voice and Allo conversation app 2017: Facebook performed some 4.5 billion automatic translations daily with LSTM Microsoft reported reaching 94.9% recognition accuracy 2019: A new RNN derived using the Legendre polynomials and outperforms the LSTM on some memory-related benchmarks. An LSTM model climbed to third place on the in Large Text Compression Benchmark
  • 6. Common Architecture of an LSTM Model The Long Short-Term Memory (LSTM) Model architecture is based on Recurrent Neural Network (RNN) architecture. One thing that sets the RNN/LSTM model apart from typical Neural Nets is that LSTMs have feedback networks, instead of the common feedforward network. LSTMs are commonly composed of four parts, a cell, an input gate, output gate and forget gate. ● The cell is the memory part of the LSTM unit and is responsible for keeping up with the dependencies between the elements in the input sequence. ● The input gate controls how a new value comes into the cell. ● The output gate controls how the value in the cell is used to compute the output activation of an LSTM unit. ● The forget gate controls whether a value stays in the cell. ● The weights of the connections, both into and out of the gates, determine how the gate will operate.
  • 10. Variations on the LSTM Model There are quite a few variations on the LSTM model, here are a few examples: ● Peephole LSTM: ○ A peephole LSTM has peephole connections which allow the gates to access the constant error carousel (CEC) which has an activation in the cell state. ● Peephole Convolutional LSTM: ○ Similar to the peephole LSTM, this model adds convolutional layers making it more efficient in processing image and video data. ● Gated Recurrent Units (GRUs): ○ This model follows the common architecture of an LSTM model, but does not have an output gate. ● Multiplicative LSTM (mLSTM): ○ This is a complex model that has achieved state of the art achievements in natural language processing. ○ OpenAI’s unsupervised sentiment neuron is based on mLSTMs.
  • 11. Vanishing gradient issue in training RNN 1 2 3 Back propagation Chain rule Long sequence biased to capture shorter-term dependencies Solution 1:choosing the right activation function Solution 2:initialize our weights differently Solution 3: gated cells, make each noted a more complex unit with gates controlling what information is passed through → LSTM Time step Time step Time step
  • 12. LSTM cell from figure 4.2 in “Supervised Sequence Labelling with Recurrent Neural Networks” by Alex Graves ● Optimization algorithm: gradient descent combined with backpropagation ● Adam,RMSprop,etc ● LSTM is a differentiable function approximator that is typically trained with gradient descent. Recently, non gradient-based training methods of LSTM have also been considered
  • 13. ● Widely used in speech recognition due to its simplicity in training and efficiency in decoding. ● Rule: Remove blank frames (null) Merge repetition frames Ex: ex ex ex → ex ● Issues: Each output is decided independently Align input label with corresponding output Connectionist Temporal Classification (CTC) Paired training data X1,X2,X3,X4 W Encoder -->LSTM X1 X2 X3 X4 X5 h1 h2 h3 h4 h5 classifier Token distribution cross-entropy =Softmax( hi ) Vocabulary size V
  • 14. Successful examples in unsupervised training Dota 2 Learning Dexterity Each of OpenAI Five’s networks contain a single-layer, 1024-unit LSTM that sees the current game state (extracted from Valve’s Bot API) and emits actions through several possible action heads. We represent the policy as a recurrent neural network with memory, namely an LSTM with an additional hidden layer with ReLU activations inserted between inputs and the LSTM. Demo of Learning Dexterity
  • 15. Time series prediction CTC score function Alternatives Sign language translation Speech recognition Handwriting recognition Drug design Robot control Time series prediction Rhythm learning Music composition Grammar learning Human action recognition Protein homology detection Predicting subcellular localization of proteins….. There are much more to do with LSTM!
  • 16. Resources Main article: Long short-term memory Supplemental resources: ● Learning Dexterous In-Hand Manipulation ● Open AI 5 v.s. Dora2 ● END-TO-END SPEECH RECOGNITION USING A HIGH RANK LSTM-CTC BASED MODEL ● Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks ● Supervised Sequence Labelling with Recurrent Neural Networks