Submit Search
Upload
RNN Explore
•
8 likes
•
2,742 views
Y
Yan Kang
Follow
RNN, LSTM, GRU Comparison on Time Series Data.
Read less
Read more
Software
Report
Share
Report
Share
1 of 80
Download now
Download to read offline
Recommended
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
LSTM Tutorial
LSTM Tutorial
Ralph Schlosser
Deep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
Buhwan Jeong
mohsin dalvi artificial neural networks questions
mohsin dalvi artificial neural networks questions
Akash Maurya
Sparse Distributed Representations: Our Brain's Data Structure
Sparse Distributed Representations: Our Brain's Data Structure
Numenta
mohsin dalvi artificial neural networks presentation
mohsin dalvi artificial neural networks presentation
Akash Maurya
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
Recommended
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
LSTM Tutorial
LSTM Tutorial
Ralph Schlosser
Deep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
Buhwan Jeong
mohsin dalvi artificial neural networks questions
mohsin dalvi artificial neural networks questions
Akash Maurya
Sparse Distributed Representations: Our Brain's Data Structure
Sparse Distributed Representations: Our Brain's Data Structure
Numenta
mohsin dalvi artificial neural networks presentation
mohsin dalvi artificial neural networks presentation
Akash Maurya
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
Intro to Neural Networks
Intro to Neural Networks
Dean Wyatte
Deep learning frameworks v0.40
Deep learning frameworks v0.40
Jessica Willis
Tutorial on Deep Learning
Tutorial on Deep Learning
inside-BigData.com
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
Deep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
S N
Introduction to Deep Learning
Introduction to Deep Learning
Adam Rogers
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
Yan Xu
Convolutional neural network
Convolutional neural network
Yan Xu
Deep neural networks
Deep neural networks
Si Haem
Neural Networks in Data Mining - “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Greg Makowski
Deep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
Deep Learning - Overview of my work II
Deep Learning - Overview of my work II
Mohamed Loey
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
Neural network & its applications
Neural network & its applications
Ahmed_hashmi
Introduction to Deep Learning
Introduction to Deep Learning
Oleg Mygryn
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
Poo Kuan Hoong
Deep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from Basics
PlusOrMinusZero
An Introduction to Artificial Neural Networks
An Introduction to Artificial Neural Networks
Cameron Vetter
More Related Content
What's hot
Intro to Neural Networks
Intro to Neural Networks
Dean Wyatte
Deep learning frameworks v0.40
Deep learning frameworks v0.40
Jessica Willis
Tutorial on Deep Learning
Tutorial on Deep Learning
inside-BigData.com
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
Deep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
S N
Introduction to Deep Learning
Introduction to Deep Learning
Adam Rogers
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
Yan Xu
Convolutional neural network
Convolutional neural network
Yan Xu
Deep neural networks
Deep neural networks
Si Haem
Neural Networks in Data Mining - “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Greg Makowski
Deep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
Deep Learning - Overview of my work II
Deep Learning - Overview of my work II
Mohamed Loey
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
Neural network & its applications
Neural network & its applications
Ahmed_hashmi
Introduction to Deep Learning
Introduction to Deep Learning
Oleg Mygryn
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
Poo Kuan Hoong
What's hot
(20)
Intro to Neural Networks
Intro to Neural Networks
Deep learning frameworks v0.40
Deep learning frameworks v0.40
Tutorial on Deep Learning
Tutorial on Deep Learning
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Deep learning - A Visual Introduction
Deep learning - A Visual Introduction
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
Introduction to Deep Learning
Introduction to Deep Learning
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
Convolutional neural network
Convolutional neural network
Deep neural networks
Deep neural networks
Neural Networks in Data Mining - “An Overview”
Neural Networks in Data Mining - “An Overview”
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Deep learning from a novice perspective
Deep learning from a novice perspective
Deep Learning - Overview of my work II
Deep Learning - Overview of my work II
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Neural network & its applications
Neural network & its applications
Introduction to Deep Learning
Introduction to Deep Learning
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
Similar to RNN Explore
Deep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from Basics
PlusOrMinusZero
An Introduction to Artificial Neural Networks
An Introduction to Artificial Neural Networks
Cameron Vetter
RNN basics in deep learning
RNN basics in deep learning
chauhankapil
Deep and Recurrent Neural Networks for Applications in Robotics
Deep and Recurrent Neural Networks for Applications in Robotics
Matt Fleetwood
Student intervention detection using deep learning technique
Student intervention detection using deep learning technique
Venkat Projects
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
Amr Kamel Deklel
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]
Russell Jarvis
GlobalAI2016-Yuwei
GlobalAI2016-Yuwei
Yuwei (Danny) Cui
Neural network
Neural network
Faireen
Recurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
Final cnn shruthi gali
Final cnn shruthi gali
Sam Ram
Deep learning notes.pptx
Deep learning notes.pptx
Pandi Gingee
vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'
Luca Cozzuto
Deep learning with Keras
Deep learning with Keras
QuantUniversity
Artificial neural networks and its applications
Artificial neural networks and its applications
PoojaKoshti2
KeithWiley_NeuromorphicComputing_and_CM1K_and_emulator_talk_wide
KeithWiley_NeuromorphicComputing_and_CM1K_and_emulator_talk_wide
Keith Wiley
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET Journal
SeRanet introduction
SeRanet introduction
Kosuke Nakago
Mask RCNN on brain tumor.pptx
Mask RCNN on brain tumor.pptx
DanishKhan643986
Student intervention detection using deep learning technique
Student intervention detection using deep learning technique
Venkat Projects
Similar to RNN Explore
(20)
Deep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from Basics
An Introduction to Artificial Neural Networks
An Introduction to Artificial Neural Networks
RNN basics in deep learning
RNN basics in deep learning
Deep and Recurrent Neural Networks for Applications in Robotics
Deep and Recurrent Neural Networks for Applications in Robotics
Student intervention detection using deep learning technique
Student intervention detection using deep learning technique
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]
GlobalAI2016-Yuwei
GlobalAI2016-Yuwei
Neural network
Neural network
Recurrent Neural Network
Recurrent Neural Network
Final cnn shruthi gali
Final cnn shruthi gali
Deep learning notes.pptx
Deep learning notes.pptx
vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'
Deep learning with Keras
Deep learning with Keras
Artificial neural networks and its applications
Artificial neural networks and its applications
KeithWiley_NeuromorphicComputing_and_CM1K_and_emulator_talk_wide
KeithWiley_NeuromorphicComputing_and_CM1K_and_emulator_talk_wide
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural Network
SeRanet introduction
SeRanet introduction
Mask RCNN on brain tumor.pptx
Mask RCNN on brain tumor.pptx
Student intervention detection using deep learning technique
Student intervention detection using deep learning technique
Recently uploaded
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
gurkirankumar98700
Professional Resume Template for Software Developers
Professional Resume Template for Software Developers
Vinodh Ram
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
soniya singh
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
AxelRicardoTrocheRiq
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Ahmed Mohamed
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
Dinusha Kumarasiri
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
Wave PLM
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
kotipi9215
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
MyIntelliSource, Inc.
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
Christina Lin
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
nada99848
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
Sujith Sukumaran
MYjobs Presentation Django-based project
MYjobs Presentation Django-based project
AnoyGreter
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
umasea
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
stazi3110
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
StefanoLambiase
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
VICTOR MAESTRE RAMIREZ
Recently uploaded
(20)
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
Professional Resume Template for Software Developers
Professional Resume Template for Software Developers
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
MYjobs Presentation Django-based project
MYjobs Presentation Django-based project
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
RNN Explore
1.
RNN Explore RNN, LSTM, GRU, Hyperparameters By Yan Kang
2.
1 2 3 4 Three Recurrent Cells Hyperparameters Experiments and Results Conclusion CONTENT
3.
RNN Cells
4.
Why RNN? Standard Neural Network: Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
5.
Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
6.
Standard Neural Network: Only accept fixed-size vector as input and output Why RNN? Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
7.
Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
8.
Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output X Images from: https://en.wikipedia.org/wiki/Artificial_neural_network http://agustis-place.blogspot.com/2010/01/4th-eso-msc-computer-assisted-task-unit.html?_sm_au_=iVVJSQ4WZH27rJM0
9.
Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output X Images from: https://en.wikipedia.org/wiki/Artificial_neural_network http://agustis-place.blogspot.com/2010/01/4th-eso-msc-computer-assisted-task-unit.html?_sm_au_=iVVJSQ4WZH27rJM0
10.
Vanilla RNN Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 加线
11.
Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Vanilla RNN
12.
Achieve it in 1 min: Ht = tanh(Xt*Ui + ht-1*Ws
+ b) Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Vanilla RNN
13.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
14.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
15.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
16.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
17.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
18.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
19.
LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
20.
LSTM Limitation? Redundant gates/parameters:
21.
LSTM Limitation? Redundant gates/parameters: The output gate was the least important for the performance of the LSTM. When removed, ℎ" simply becomes tanℎ(𝐶") which was sufficient for retaining most of the LSTM’s performance. --
Google ”An Empirical Exploration of Recurrent Network Architectures” Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
22.
LSTM Limitation? Redundant gates/parameters: The LSTM unit computes the new memory content without any separate control of the amount of information flowing from the previous time step. -- “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling” Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
23.
GRU Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ GRU
24.
GRU Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ LSTM GRU
25.
Hyperparameters
26.
Number of Layers Other than using only one recurrent cell, there is another very common way to construct the recurrent units. Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
27.
Number of Layers Other than using only one recurrent cell, there is another very common way to construct the recurrent units. Stacked RNN: Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
28.
Hidden Size RNN LSTM GRU Hidden size: Hidden state size in RNN Cell state and hidden state sizes in LSTM Hidden state size in GRU Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
29.
Hidden Size RNN LSTM GRU Hidden size: Hidden state size in RNN Cell state and hidden state sizes in LSTM Hidden state size in GRU The larger, the more complicated model Recurrent Unit could memory and represent. Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
30.
Batch Size Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function:
31.
Batch Size Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: B = |x| Gradient Descent B > 1 & B < |x| Stochastic Gradient Descent
32.
Batch Size Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: B = |x| Gradient Descent B > 1 & B < |x| Stochastic Gradient Descent Batch size : B –
the number of instances used to update weights once.
33.
Learning Rate Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function:
34.
Learning Rate Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: Learning rate 𝜀 𝑡 --
how much weights are changed in each update. Decrease it when getting close to the target.
35.
Learning Rate Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: Two learning rate updating methods were used in experiments First one, after each epoch, learning rate decays + ,. Second one, after each 5 epochs, learning rate decays + ,.
36.
Experiments & Results
37.
𝑠0 𝑠+ 𝑠, 𝑠1 Variable Length: 𝑠0’ 𝑠+’ 𝑠,’ 𝑠1’ 0 … 0 … 0 … Batch 0 𝑠0’ 𝑠+’ 𝑠,’ 𝑠1’ Variable Length vs Sliding Window 𝑙0 𝑙+ 𝑙, 𝑙1 𝑙0 𝑙+ 𝑙, 𝑙1
38.
𝑠0 𝑠+ 𝑠, 𝑠1 ……𝑤00 𝑤0+ 𝑤04 Batch 0
Batch 1 𝑤0+ 𝑤0, 𝑤01 𝑤05 𝑤00 𝑤0+ 𝑤0, 𝑤056+ …… Sliding window: Variable Length vs Sliding Window
39.
Sliding Window: Advantages: Each sequence might generate tens of or even hundreds of subsequences. With same batch size to the variable length method, it means more batches in one epoch and more weights update times in each epochs – faster converge rate per epoch. Disadvantages: 1) Time consuming, longer time for each epoch; 2)
Assigning same label to all subsequence might be biased and might cause the network not converge. Variable Length vs Sliding Window
40.
Variable Length vs Sliding Window Variable Length: AUSLAN Dataset 2565 instances
41.
Sliding Window: AUSLAN Dataset 2565 instances Variable Length vs Sliding Window
42.
Variable Length: Character Trajectories Dataset 2858 instances Variable Length vs Sliding Window
43.
Sliding Window: Character Trajectories Dataset 2858 instances Variable Length vs Sliding Window
44.
Variable Length: Japanese Vowels Dataset 640 instances Variable Length vs Sliding Window
45.
Sliding Window: Japanese Vowels Dataset 640 instances Variable Length vs Sliding Window
46.
RNN vs LSTM vs GRU GRU is a simpler variant of LSTM that share many of the same properties, both of them could prevent gradient vanishing and “remember” long term dependence. And both of them outperform vanilla RNN on almost all the datasets and, either using Sliding Window or Variable Length. But GRU has fewer parameters than LSTM, and thus may train a bit faster or need less iterations to generalize. As shown in the plots, GRU does converge slightly faster.
47.
RNN vs LSTM vs GRU
48.
RNN vs LSTM vs GRU
49.
RNN vs LSTM vs GRU
50.
RNN vs LSTM vs GRU
51.
Hyperparameters Comparisons • Learning Rate •
Batch Size • Number of Layers • Hidden Size
52.
Learning Rate Two learning rate updating methods were used in experiments • First one, after each epoch, learning rate decays + ,. , totally 24 epochs. •
Second one, after each 5 epochs, learning rate decays + ,. , totally 120 epochs. The left side in the following plots uses 24 epochs, and the right side uses 120 epochs. Because of the change of learning rate updating mechanism, some not converging configurations in the left (24 epochs) work pretty well in the right (120 epochs).
53.
Learning Rate Japanese Vowels, Sliding Window, LSTM 24 epochs 120 epochs
54.
Learning Rate Japanese Vowels, Sliding Window, GRU 24 epochs 120 epochs
55.
Learning Rate Japanese Vowels, Variable Length, LSTM 24 epochs 120 epochs
56.
Learning Rate Japanese Vowels, Variable Length, GRU 24 epochs 120 epochs
57.
Batch Size The larger batch size means that each time we update weights with more instance. So it has lower bias but also slower converge rate. On the contrary, small batch size updates the weights more frequently. So small batch size converges faster but has higher bias. What we ought to do might be finding the balance between the converge rate and the risk.
58.
Batch Size Japanese Vowels Sliding Window
59.
Batch Size Japanese Vowels Variable Length
60.
Batch Size UWave Full Length Sliding Window
61.
Number of layers Multi-layer RNN is more difficult to converge. With the number of layers increasing, it’s slower to converge. And even they do, we don’t gain too much from the larger hidden units inside it, at least on Japanese Vowel dataset. The final accuracy doesn’t seem better than the one layer recurrent networks. This matches some paper’s results that stacked RNNs could be taken place by one layer with larger hidden size.
62.
Number of layers Japanese Vowels Sliding Window
63.
Number of layers Japanese Vowels Variable Length
64.
Number of layers UWave Full length Sliding Window
65.
Hidden Size Either from Japanese Vowels or UWave, the larger the hidden size on LSTM and GRU, the better the final accuracy would be. And different hidden size share similar converge rate on LSTM and GRU. But the trade-off of larger hidden size is that it takes longer time/epoch to train the network. There’re some abnormal behavior on vanilla RNN, which might be caused by the gradient vanishing.
66.
Hidden Size Japanese Vowels Sliding Window
67.
Hidden Size Japanese Vowels Variable Length
68.
Hidden Size UWave Full Length Sliding Window
69.
Conclusion
70.
Conclusion In this presentation, we first discussed: • What are RNN, LSTM and GRU, and why using them. • What are the definitions of the four hyperparameters. And through roughly 800 experiments, we analyzed: •
Difference between Sliding Window and Variable Length. • Difference among RNN, LSTM and GRU. • What’s the influence of number of layers. • What’s the influence of hidden size. • What’s the influence of batch size • What’s the influence of learning rate
71.
Conclusion In this presentation, we first discussed: • What are RNN, LSTM and GRU, and why using them.. • What are the definitions of the four hyperparameters. And through roughly 800 experiments, we analyzed: •
Difference between Sliding Window and Variable Length. • Difference among RNN, LSTM and GRU. • What’s the influence of number of layers. • What’s the influence of hidden size. • What’s the influence of batch size • What’s the influence of learning rate Generally speaking, GRU works better than LSTM, and, because of suffering gradient vanishing, vanilla RNN works worst. Sliding window is good to solve limited instance datasets, which 1) may have repetitive feature or 2) sub-sequence could capture key feature of the full sequence. All these four hyperparameters play important role in tuning the network.
72.
Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000)
73.
Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000) 2.
Sliding window: • Ignores the continuality between the sliced subsequences. • Biased labeling may causes similar subsequences being labeled differently.
74.
Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000) 2.
Sliding window: • Ignores the continuality between the sliced subsequences. • Biased labeling may causes similar subsequences being labeled differently. Luckily, these two limitations could be solved simultaneously.
75.
Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000) 2.
Sliding window: • Ignores the continuality between the sliced subsequences. • Biased labeling may causes similar subsequences being labeled differently. Luckily, these two limitations could be solved simultaneously. -- By Truncated Gradient
76.
What’s next? Truncated gradient: • Slicing the sequences in a special order that, between neighbor batches, each instance of the batch is continuous. • Not like Sliding Window initializing states in each batch from random around zero, the states from the last batch are used to initialize the next batch state. •
So that even the recurrent units are unrolled in a short range (e.g. 20 steps), the states could be passed through and the former ‘memory’ could be saved. 𝑠0 𝑠+ 𝑠, 𝑠1 ……𝑤00 𝑤0+ 𝑤04 𝑤+0 𝑤++ 𝑠+ …… Batch 0 Batch 1 𝑤0+ 𝑤++ 𝑤,+ 𝑤56+,+ 𝑤00 𝑤+0 𝑤,0 𝑤56+,0 Initialize state 𝑠0 𝑠, 𝑤,0 𝑤,+
77.
What’s next? Averaged outputs to do classification: • Right now, we are using last time step’s output to do softmax and then using Cross Entropy to estimate each class’s probability. •
Using the averaged outputs of all time steps or weighted averaged outputs might be a good choice to try.
78.
What’s next? Averaged outputs to do classification: • Right now, we are using last time step’s output to do softmax and then using Cross Entropy to estimate each class’s probability. •
Using the averaged outputs of all time steps or weighted averaged outputs might be a good choice to try. Prediction (sequence modeling): • Already did the sequence to sequence model with l2-norm loss function. • What needs to be done is finding a proper way to analyze the predicted sequence.
79.
THANK YOU Thanks for
Dmitriy’s instructions And discussions with Feipeng and Xi
80.
Questions?
Download now