Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)

380 views

Published on

https://telecombcn-dl.github.io/2017-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)

  1. 1. [course site] Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Recurrent Neural Networks #DLUPC
  2. 2. 2 Acknowledgments Santiago Pascual
  3. 3. 3 Outline 1. The importance of context 2. Where is the memory? 3. Vanilla RNN 4. Problems 5. Gating methodology a. LSTM b. GRU
  4. 4. 4 The importance of context ● Recall the 5th digit of your phone number ● Sing your favourite song beginning at third sentence ● Recall 10th character of the alphabet Probably you went straight from the beginning of the stream in each case… because in sequences order matters! Idea: retain the information preserving the importance of order
  5. 5. Recall from Day 1 Lecture 2… 5 FeedFORWARD Every y/hi is computed from the sequence of forward activations out of input x.
  6. 6. 6 Where is the Memory? If we have a sequence of samples... predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]}
  7. 7. 7 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+1] L
  8. 8. 8 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+2] L
  9. 9. 9 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise x[t+3] L ... ... ... x[t+3] x[t-L+2], …, x[t+1], x[t+2] ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t]
  10. 10. 10 Where is the Memory? ... ... ... x1, x2, …, xL Problems for the feed forward + static window approach: ● What’s the matter increasing L? → Fast growth of num of parameters! ● Decisions are independent between time-steps! ○ The network doesn’t care about what happened at previous time-step, only present window matters → doesn’t look good ● Cumbersome padding when there are not enough samples to fill L size ○ Can’t work with variable sequence lengths x1, x2, …, xL, …, x2L ... ... x1, x2, …, xL, …, x2L, …, x3L ... ... ... ...
  11. 11. 11 Recurrent Neural Network Solution: Build specific connections capturing the temporal evolution → Shared weights in time ● Give volatile memory to the network Feed-Forward Fully-Connected
  12. 12. 12 Solution: Build specific connections capturing the temporal evolution → Shared weights in time ● Give volatile memory to the network Feed-Forward Fully-ConnectedFully connected in time: recurrent matrix Recurrent Neural Network
  13. 13. Recurrent Neural Network time time Unfold (Rotation 90o ) Front View Side View Rotation 90o
  14. 14. Recurrent Neural Network Hence we have two data flows: Forward in neural layers + time propagation BEWARE: We have extra depth now! Every time-step is an extra level of depth (as a deeper stack of layers in a feed-forward fashion!)
  15. 15. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation
  16. 16. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation ○ Last time-step includes the context of our decisions recursively
  17. 17. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation ○ Last time-step includes the context of our decisions recursively
  18. 18. Recurrent Neural Network Back Propagation Through Time (BPTT): The training method has to take into account the time operations: T: max amount of time-steps to do back-prop. In Keras this is specified when defining the “input shape” to the RNN layer, by means of: (batch size, sequence length (T), input_dim) Total error at the output is the sum of errors at each time-step Total gradient is the sum of gradients at each time-step
  19. 19. Recurrent Neural Network Main problems: ● Long-term memory (remembering quite far time-steps) vanishes quickly because of the recursive operation with U ● During training gradients explode/vanish easily because of depth-in-time → Exploding/Vanishing gradients!
  20. 20. 20 Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780. Long Short-Term Memory (LSTM)
  21. 21. 21 Long Short-Term Memory (LSTM) The New York Times, “When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’” (November 2016)
  22. 22. 22 Long Short-Term Memory (LSTM) Jürgen Schmidhuber @ NIPS 2016 Barcelona
  23. 23. 23 Long Short-Term Memory (LSTM) Jürgen Schmidhuber @ NIPS 2016 Barcelona
  24. 24. 24 Long Short-Term Memory (LSTM) Jürgen Schmidhuber @ NIPS 2016 Barcelona
  25. 25. Gating method Solutions: 1. Change the way in which past information is kept → create the notion of cell state: a memory unit that keeps long-term information in a safer way by protecting it from recursive operations 2. Make every RNN unit able to forget whatever may not be useful anymore by clearing that info from the cell state (optimized clearing mechanism) 3. Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism) 4. Make every RNN unit able to output the decisions whenever it is ready to do so (optimized output mechanism)
  26. 26. 26 Long Short-Term Memory (LSTM) Three gates are governed by sigmoid units (btw [0,1]) define the control of in & out information with a product.. Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)
  27. 27. 27 Long Short-Term Memory (LSTM) Forget Gate: Concatenate Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to forget whatever may not be useful anymore by clearing that info from the cell state (optimized clearing mechanism)
  28. 28. 28 Long Short-Term Memory (LSTM) Input Gate Layer New contribution to cell state Classic neuron Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism)
  29. 29. 29 Long Short-Term Memory (LSTM) Forget + Input Gates = Update Cell State (memory): Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism)
  30. 30. 30 Long Short-Term Memory (LSTM) Output Gate Layer Output to next layer & timestep Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to output the decisions whenever it is ready to do so (optimized output mechanism)
  31. 31. 31 Long Short-Term Memory (LSTM) Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  32. 32. Long Short Term Memory (LSTM) cell 3 gates + input activation Compared to a non-gated RNN, an LSTM has four times more parameters because of the additional neurons that govern the gates:
  33. 33. Long Short Term Memory (LSTM) cell Updating an LSTM cell requires 6 computations: 1. Gates 2. Activation units 3. Cell state Computation Flow
  34. 34. 34 Gated Recurrent Unit (GRU) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014. GRU obtain a similar performance as LSTM with one gate less.
  35. 35. 35 Gated Recurrent Unit (GRU) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014. GRU obtain a similar performance as LSTM with one gate less.
  36. 36. 36 Outline 1. The importance of context 2. Where is the memory? 3. Vanilla RNN 4. Problems 5. Gating methodology a. LSTM b. GRU
  37. 37. 37 Thanks ! Q&A ? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi

×