Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RNN Explore

1,736 views

Published on

RNN, LSTM, GRU Comparison on Time Series Data.

Published in: Software
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

RNN Explore

  1. 1. RNN Explore RNN, LSTM, GRU, Hyperparameters By Yan Kang
  2. 2. 1 2 3 4 Three Recurrent Cells Hyperparameters Experiments and Results Conclusion CONTENT
  3. 3. RNN Cells
  4. 4. Why RNN? Standard Neural Network: Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
  5. 5. Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
  6. 6. Standard Neural Network: Only accept fixed-size vector as input and output Why RNN? Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
  7. 7. Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output Images from: https://en.wikipedia.org/wiki/Artificial_neural_network
  8. 8. Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output X Images from: https://en.wikipedia.org/wiki/Artificial_neural_network http://agustis-place.blogspot.com/2010/01/4th-eso-msc-computer-assisted-task-unit.html?_sm_au_=iVVJSQ4WZH27rJM0
  9. 9. Why RNN? Standard Neural Network: Only accept fixed-size vector as input and output X Images from: https://en.wikipedia.org/wiki/Artificial_neural_network http://agustis-place.blogspot.com/2010/01/4th-eso-msc-computer-assisted-task-unit.html?_sm_au_=iVVJSQ4WZH27rJM0
  10. 10. Vanilla RNN Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 加线
  11. 11. Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Vanilla RNN
  12. 12. Achieve it in 1 min: Ht = tanh(Xt*Ui + ht-1*Ws + b) Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Vanilla RNN
  13. 13. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  14. 14. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  15. 15. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  16. 16. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  17. 17. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  18. 18. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  19. 19. LSTM Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  20. 20. LSTM Limitation? Redundant gates/parameters:
  21. 21. LSTM Limitation? Redundant gates/parameters: The output gate was the least important for the performance of the LSTM. When removed, ℎ" simply becomes tanℎ(𝐶") which was sufficient for retaining most of the LSTM’s performance. -- Google ”An Empirical Exploration of Recurrent Network Architectures” Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  22. 22. LSTM Limitation? Redundant gates/parameters: The LSTM unit computes the new memory content without any separate control of the amount of information flowing from the previous time step. -- “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling” Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  23. 23. GRU Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ GRU
  24. 24. GRU Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ LSTM GRU
  25. 25. Hyperparameters
  26. 26. Number of Layers Other than using only one recurrent cell, there is another very common way to construct the recurrent units. Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  27. 27. Number of Layers Other than using only one recurrent cell, there is another very common way to construct the recurrent units. Stacked RNN: Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  28. 28. Hidden Size RNN LSTM GRU Hidden size: Hidden state size in RNN Cell state and hidden state sizes in LSTM Hidden state size in GRU Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  29. 29. Hidden Size RNN LSTM GRU Hidden size: Hidden state size in RNN Cell state and hidden state sizes in LSTM Hidden state size in GRU The larger, the more complicated model Recurrent Unit could memory and represent. Image from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  30. 30. Batch Size Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function:
  31. 31. Batch Size Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: B = |x| Gradient Descent B > 1 & B < |x| Stochastic Gradient Descent
  32. 32. Batch Size Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: B = |x| Gradient Descent B > 1 & B < |x| Stochastic Gradient Descent Batch size : B – the number of instances used to update weights once.
  33. 33. Learning Rate Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function:
  34. 34. Learning Rate Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: Learning rate 𝜀 𝑡 -- how much weights are changed in each update. Decrease it when getting close to the target.
  35. 35. Learning Rate Image from: https://www.quora.com/Whats-the-difference- between-gradient-descent-and-stochastic-gradient-descent Optimization function: Two learning rate updating methods were used in experiments First one, after each epoch, learning rate decays + ,. Second one, after each 5 epochs, learning rate decays + ,.
  36. 36. Experiments & Results
  37. 37. 𝑠0 𝑠+ 𝑠, 𝑠1 Variable Length: 𝑠0’ 𝑠+’ 𝑠,’ 𝑠1’ 0 … 0 … 0 … Batch 0 𝑠0’ 𝑠+’ 𝑠,’ 𝑠1’ Variable Length vs Sliding Window 𝑙0 𝑙+ 𝑙, 𝑙1 𝑙0 𝑙+ 𝑙, 𝑙1
  38. 38. 𝑠0 𝑠+ 𝑠, 𝑠1 ……𝑤00 𝑤0+ 𝑤04 Batch 0 Batch 1 𝑤0+ 𝑤0, 𝑤01 𝑤05 𝑤00 𝑤0+ 𝑤0, 𝑤056+ …… Sliding window: Variable Length vs Sliding Window
  39. 39. Sliding Window: Advantages: Each sequence might generate tens of or even hundreds of subsequences. With same batch size to the variable length method, it means more batches in one epoch and more weights update times in each epochs – faster converge rate per epoch. Disadvantages: 1) Time consuming, longer time for each epoch; 2) Assigning same label to all subsequence might be biased and might cause the network not converge. Variable Length vs Sliding Window
  40. 40. Variable Length vs Sliding Window Variable Length: AUSLAN Dataset 2565 instances
  41. 41. Sliding Window: AUSLAN Dataset 2565 instances Variable Length vs Sliding Window
  42. 42. Variable Length: Character Trajectories Dataset 2858 instances Variable Length vs Sliding Window
  43. 43. Sliding Window: Character Trajectories Dataset 2858 instances Variable Length vs Sliding Window
  44. 44. Variable Length: Japanese Vowels Dataset 640 instances Variable Length vs Sliding Window
  45. 45. Sliding Window: Japanese Vowels Dataset 640 instances Variable Length vs Sliding Window
  46. 46. RNN vs LSTM vs GRU GRU is a simpler variant of LSTM that share many of the same properties, both of them could prevent gradient vanishing and “remember” long term dependence. And both of them outperform vanilla RNN on almost all the datasets and, either using Sliding Window or Variable Length. But GRU has fewer parameters than LSTM, and thus may train a bit faster or need less iterations to generalize. As shown in the plots, GRU does converge slightly faster.
  47. 47. RNN vs LSTM vs GRU
  48. 48. RNN vs LSTM vs GRU
  49. 49. RNN vs LSTM vs GRU
  50. 50. RNN vs LSTM vs GRU
  51. 51. Hyperparameters Comparisons • Learning Rate • Batch Size • Number of Layers • Hidden Size
  52. 52. Learning Rate Two learning rate updating methods were used in experiments • First one, after each epoch, learning rate decays + ,. , totally 24 epochs. • Second one, after each 5 epochs, learning rate decays + ,. , totally 120 epochs. The left side in the following plots uses 24 epochs, and the right side uses 120 epochs. Because of the change of learning rate updating mechanism, some not converging configurations in the left (24 epochs) work pretty well in the right (120 epochs).
  53. 53. Learning Rate Japanese Vowels, Sliding Window, LSTM 24 epochs 120 epochs
  54. 54. Learning Rate Japanese Vowels, Sliding Window, GRU 24 epochs 120 epochs
  55. 55. Learning Rate Japanese Vowels, Variable Length, LSTM 24 epochs 120 epochs
  56. 56. Learning Rate Japanese Vowels, Variable Length, GRU 24 epochs 120 epochs
  57. 57. Batch Size The larger batch size means that each time we update weights with more instance. So it has lower bias but also slower converge rate. On the contrary, small batch size updates the weights more frequently. So small batch size converges faster but has higher bias. What we ought to do might be finding the balance between the converge rate and the risk.
  58. 58. Batch Size Japanese Vowels Sliding Window
  59. 59. Batch Size Japanese Vowels Variable Length
  60. 60. Batch Size UWave Full Length Sliding Window
  61. 61. Number of layers Multi-layer RNN is more difficult to converge. With the number of layers increasing, it’s slower to converge. And even they do, we don’t gain too much from the larger hidden units inside it, at least on Japanese Vowel dataset. The final accuracy doesn’t seem better than the one layer recurrent networks. This matches some paper’s results that stacked RNNs could be taken place by one layer with larger hidden size.
  62. 62. Number of layers Japanese Vowels Sliding Window
  63. 63. Number of layers Japanese Vowels Variable Length
  64. 64. Number of layers UWave Full length Sliding Window
  65. 65. Hidden Size Either from Japanese Vowels or UWave, the larger the hidden size on LSTM and GRU, the better the final accuracy would be. And different hidden size share similar converge rate on LSTM and GRU. But the trade-off of larger hidden size is that it takes longer time/epoch to train the network. There’re some abnormal behavior on vanilla RNN, which might be caused by the gradient vanishing.
  66. 66. Hidden Size Japanese Vowels Sliding Window
  67. 67. Hidden Size Japanese Vowels Variable Length
  68. 68. Hidden Size UWave Full Length Sliding Window
  69. 69. Conclusion
  70. 70. Conclusion In this presentation, we first discussed: • What are RNN, LSTM and GRU, and why using them. • What are the definitions of the four hyperparameters. And through roughly 800 experiments, we analyzed: • Difference between Sliding Window and Variable Length. • Difference among RNN, LSTM and GRU. • What’s the influence of number of layers. • What’s the influence of hidden size. • What’s the influence of batch size • What’s the influence of learning rate
  71. 71. Conclusion In this presentation, we first discussed: • What are RNN, LSTM and GRU, and why using them.. • What are the definitions of the four hyperparameters. And through roughly 800 experiments, we analyzed: • Difference between Sliding Window and Variable Length. • Difference among RNN, LSTM and GRU. • What’s the influence of number of layers. • What’s the influence of hidden size. • What’s the influence of batch size • What’s the influence of learning rate Generally speaking, GRU works better than LSTM, and, because of suffering gradient vanishing, vanilla RNN works worst. Sliding window is good to solve limited instance datasets, which 1) may have repetitive feature or 2) sub-sequence could capture key feature of the full sequence. All these four hyperparameters play important role in tuning the network.
  72. 72. Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000)
  73. 73. Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000) 2. Sliding window: • Ignores the continuality between the sliced subsequences. • Biased labeling may causes similar subsequences being labeled differently.
  74. 74. Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000) 2. Sliding window: • Ignores the continuality between the sliced subsequences. • Biased labeling may causes similar subsequences being labeled differently. Luckily, these two limitations could be solved simultaneously.
  75. 75. Limitations However there are still some limitations: 1. Variable length: • The sequence length is too long (~100-300 for most datasets, some even larger than 1000) 2. Sliding window: • Ignores the continuality between the sliced subsequences. • Biased labeling may causes similar subsequences being labeled differently. Luckily, these two limitations could be solved simultaneously. -- By Truncated Gradient
  76. 76. What’s next? Truncated gradient: • Slicing the sequences in a special order that, between neighbor batches, each instance of the batch is continuous. • Not like Sliding Window initializing states in each batch from random around zero, the states from the last batch are used to initialize the next batch state. • So that even the recurrent units are unrolled in a short range (e.g. 20 steps), the states could be passed through and the former ‘memory’ could be saved. 𝑠0 𝑠+ 𝑠, 𝑠1 ……𝑤00 𝑤0+ 𝑤04 𝑤+0 𝑤++ 𝑠+ …… Batch 0 Batch 1 𝑤0+ 𝑤++ 𝑤,+ 𝑤56+,+ 𝑤00 𝑤+0 𝑤,0 𝑤56+,0 Initialize state 𝑠0 𝑠, 𝑤,0 𝑤,+
  77. 77. What’s next? Averaged outputs to do classification: • Right now, we are using last time step’s output to do softmax and then using Cross Entropy to estimate each class’s probability. • Using the averaged outputs of all time steps or weighted averaged outputs might be a good choice to try.
  78. 78. What’s next? Averaged outputs to do classification: • Right now, we are using last time step’s output to do softmax and then using Cross Entropy to estimate each class’s probability. • Using the averaged outputs of all time steps or weighted averaged outputs might be a good choice to try. Prediction (sequence modeling): • Already did the sequence to sequence model with l2-norm loss function. • What needs to be done is finding a proper way to analyze the predicted sequence.
  79. 79. THANK YOU Thanks for Dmitriy’s instructions And discussions with Feipeng and Xi
  80. 80. Questions?

×