Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Skipping and Repeating Samples in Recurrent Neural Networks

591 views

Published on

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.

https://imatge-upc.github.io/skiprnn-2017-telecombcn/

-------------------------------

Deep networks commonly perform better than shallow ones, but allocating the proper amount of computation for each particular input sample remains an open problem. This issue is particularly challenging in sequential tasks, where the required complexity may vary for different tokens in the input sequence. Adaptive Computation Time (ACT) was proposed as a method for dynamically adapting the computation at each step for Recurrent Neural Networks (RNN). ACT introduces two main modifications to the regular RNN formulation: (1) more than one RNN steps may be executed between an input sample is fed to the layer and and this layer generates an output, and (2) this number of steps is dynamically predicted depending on the input token and the hidden state of the network. In our work, we aim at gaining intuition about the contribution of these two factors to the overall performance boost observed when augmenting RNNs with ACT. We design a new baseline, Repeat-RNN, which performs a constant number of RNN state updates larger than one before generating an output. Surprisingly, such uniform distribution of the computational resources matches the performance of ACT in the studied tasks. We hope that this finding motivates new research efforts towards designing RNN architectures that are able to dynamically allocate computational resources.

https://imatge-upc.github.io/danifojo-2018-repeatrnn/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Skipping and Repeating Samples in Recurrent Neural Networks

  1. 1. Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Intelligent Data Science and Artificial Intelligence Center Universitat Politecnica de Catalunya (UPC) DEEP LEARNING WORKSHOP Dublin City University 21-22 May 2018 Skipping and Repeating Samples in RNNs Case study III #InsightDL2018
  2. 2. bit.ly/InsightDL2018 #InsightDL2018 Our team @ UPC IDEAI Barcelona Victor Campos Amaia Salvador Amanda Duarte Dani Fojo Görkem Çamli Akis Linardos Santiago Pascual Xavi Giró Miriam Bellver Marta R. Costa.jussà Ferran Marqués Jordi Torres (BSC) Marc Assens Sandra Roca Miquel LLobet Antonio Bonafonte
  3. 3. bit.ly/InsightDL2018 #InsightDL2018 Kevin McGuinness Shih-Fu Chang Noel E. O’Connor Cathal Gurrin Eva Mohedano Carles Ventura Our partners @ academia Francisco Roldan Marta Coll Paula Gómez Alejandro Woodward Marc Górriz
  4. 4. bit.ly/InsightDL2018 #InsightDL2018 Dèlia Fernández Elisenda Bou Sebastian Palacio Eduard Ramon Andreu Girbau Carlos Arenas Our team @ industry
  5. 5. 2016 20172015 Convolutional Neural Networks (CNNs) Evolution Strategies Recurrent Neural Networks (RNN) Adversarial Training 2018 Unsupervised Learning Reinforcement Learning
  6. 6. 2016 20172015 Convolutional Neural Networks (CNNs) Evolution Strategies Recurrent Neural Networks (RNN) Adversarial Training 2018 Unsupervised Learning Reinforcement Learning
  7. 7. bit.ly/InsightDL2018 #InsightDL2018 Motivation: Dynamic Computation 2+2
  8. 8. bit.ly/InsightDL2018 #InsightDL2018 Outline Recurrent Neural Networks (RNNs) SkipRNN [ICLR 2018] RepeatRNN [ICLRW 2018]
  9. 9. bit.ly/InsightDL2018 #InsightDL2018 Outline Recurrent Neural Networks (RNNs) SkipRNN [ICLR 2018] RepeatRNN [ICLRW 2018]
  10. 10. bit.ly/InsightDL2018 #InsightDL2018 Feed-forward Neural Network INPUT(x) OUTPUT(y) FeedForward Hidden States h1 & h2 Feed-forward Weights (Wi ) Figure: Hugo Larochelle
  11. 11. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network
  12. 12. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network Recurrent Weights (U) Feed-forward Weights (W)
  13. 13. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network Updated state Previous state INPUT(x)
  14. 14. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network time time Unfold (Rotation 90o ) Unfold (Rotation 90o )
  15. 15. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network S x1 s1 S x2 s2 S xT sT … time
  16. 16. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network CNN CNN CNN... RNN RNN RNN... Video sequences (eg. action recognition)
  17. 17. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network Word sequences (eg. Machine Translation)
  18. 18. bit.ly/InsightDL2018 #InsightDL2018 Recurrent Neural Network Spectrograms sequence (eg. speech recognition) T H _ _ E … D _ O _ _G
  19. 19. bit.ly/InsightDL2018 #InsightDL2018 Outline Recurrent Neural Networks (RNNs) SkipRNN [ICLR 2018] RepeatRNN [ICLRW 2018]
  20. 20. bit.ly/InsightDL2018 #InsightDL2018 Skip RNN: Learning to Skip State Updates in RNNs Víctor Campos Jordi Torres Xavier Giró-i-Nieto Shih-Fu ChangBrendan Jou Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
  21. 21. bit.ly/InsightDL2018 #InsightDL2018 Motivation Used Unused CNN CNN CNN... RNN RNN RNN...
  22. 22. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN S x1 s1 time
  23. 23. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN S x1 s1 time COPY
  24. 24. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN S x1 s1 x2 time COPY
  25. 25. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN S x1 s1 S x2 s1 time COPY
  26. 26. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN S x1 s1 S x2 s1 time COPY UPDATE
  27. 27. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN S x1 s1 S x2 s1 time S x3 s3 COPY UPDATE
  28. 28. bit.ly/InsightDL2018 #InsightDL2018 Limiting computation cost per sample (hyperparameter) 1 if sample used 0 otherwise Intuition: the network can be encouraged to perform fewer updates by adding a penalization when ut = 1
  29. 29. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Sequential MNIST Used Unused
  30. 30. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Sequential MNIST Epochs for Skip LSTM (λ = 10-4 ) ~30% acc ~50% acc ~70% acc ~95% acc 11 epochs 51 epochs 101 epochs 400 epochs Used Unused
  31. 31. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Sequential MNIST cost per sample (hyperparameter) Random baseline
  32. 32. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Sequential MNIST Better accuracy... Ours
  33. 33. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Sequential MNIST ...with less computation Ours
  34. 34. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Action Localization CNN CNN CNN... RNN RNN RNN... better
  35. 35. bit.ly/InsightDL2018 #InsightDL2018 Experiments: Action Localization CNN CNN CNN... RNN RNN RNN... faster
  36. 36. bit.ly/InsightDL2018 #InsightDL2018 Open science https://git.io/skip-rnn
  37. 37. bit.ly/InsightDL2018 #InsightDL2018 Outline Recurrent Neural Networks (RNNs) SkipRNN [ICLR 2018] RepeatRNN [ICLRW 2018]
  38. 38. Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow Víctor Campos Xavier Giró-i-NietoDani Fojo Fojo, Daniel, Víctor Campos, and Xavier Giró-i-Nieto. "Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks." ICLR Workshop 2018.
  39. 39. bit.ly/InsightDL2018 #InsightDL2018 Adaptive Computation Time (ACT) Graves, Alex. "Adaptive computation time for recurrent neural networks." arXiv preprint arXiv:1603.08983 (2016). REPEAT 3 REPEAT 2
  40. 40. bit.ly/InsightDL2018 #InsightDL2018 ACT Adaptive Computation Time (ACT)
  41. 41. bit.ly/InsightDL2018 #InsightDL2018 ACT vs RepeatRNN ACT RepeatRNN
  42. 42. bit.ly/InsightDL2018 #InsightDL2018 Experiment: One-hot addition
  43. 43. bit.ly/InsightDL2018 #InsightDL2018 Experiment: One-hot addition Ours
  44. 44. bit.ly/InsightDL2018 #InsightDL2018 Experiment: One-hot addition Interpretable hyperparameter Ours
  45. 45. bit.ly/InsightDL2018 #InsightDL2018 Experiment: One-hot addition Faster training…. Ours
  46. 46. bit.ly/InsightDL2018 #InsightDL2018 Experiment: One-hot addition ...and faster inference Ours
  47. 47. bit.ly/InsightDL2018 #InsightDL2018 Open Science bit.ly/repeat-rnn
  48. 48. bit.ly/InsightDL2018 #InsightDL2018 Outline Recurrent Neural Networks (RNNs) SkipRNN [ICLR 2018] RepeatRNN [ICLRW 2018]
  49. 49. bit.ly/InsightDL2018 #InsightDL2018 Acknowledgements
  50. 50. bit.ly/InsightDL2018 #InsightDL2018Deep Learning courses @ UPC (videos) ● MSc course (2017) ● BSc course (2018) ● 1st edition (2016) ● 2nd edition (2017) ● 3rd edition (2018) ● 1st edition (2017) ● 2nd edition (2018) Next edition Autumn 2018 Next edition Winter/Spring 2019Summer School (starts 28/06)
  51. 51. @DocXavi Xavier Giro-i-Nieto Download slides from: http://bit.ly/InsightDL2018 We are looking for both industrial & academic partners: xavier.giro@upc.edu #InsightDL2018 Click to comment:
  52. 52. Questions?
  53. 53. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Intuition: introduce a binary update state gate, ut , deciding whether the RNN state is updated or copied S(xt , ht-1 ) if ut = 1 // update operation st-1 if ut = 0 // copy operation st =S xt st
  54. 54. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Update state gate ∈ {0, 1}
  55. 55. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Update state gate ∈ {0, 1} Update state probability ∈ [0, 1]
  56. 56. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Update state gate ∈ {0, 1} Update state probability ∈ [0, 1] Increment for the update state probability
  57. 57. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Update state gate ∈ {0, 1} Update state probability ∈ [0, 1] Increment for the update state probability
  58. 58. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Update state (ut = 1) Copy state (ut = 0)
  59. 59. bit.ly/InsightDL2018 #InsightDL2018 SkipRNN Update state (ut = 1) Copy state (ut = 0)
  60. 60. bit.ly/InsightDL2018 #InsightDL2018 Straight Though Estimator fbinarize (x) x Forward pass 1 10 G. Hinton, Neural Networks for Machine Learning, Coursera video lectures 2012
  61. 61. bit.ly/InsightDL2018 #InsightDL2018 fbinarize (x) x Forward pass Backward pass G. Hinton, Neural Networks for Machine Learning, Coursera video lectures 2012 1 10 Straight Though Estimator

×