RNN and LSTM
(Oct 12, 2016)
YANG Jiancheng
Outline
• I. Vanilla RNN
• II. LSTM
• III. GRU and Other Structures
• I. Vanilla RNN
In theory, RNNs are absolutely capable of handling such “long-
term dependencies.” A human could carefully pick parameters for
them to solve toy problems of this form. Sadly, in practice, RNNs
don’t seem to be able to learn them.
GREAT Intro: Understanding LSTM Networks
• I. Vanilla RNN
WILDML has a series of articles to introduce RNN (4 articles, 2 GitHub repos).
• I. Vanilla RNN
• Back Prop Through Time (BPTT)
• I. Vanilla RNN
• Back Prop Through Time (BPTT)
• I. Vanilla RNN
• Gradient Vanishing Problem
tanh and derivative. Source: http://nn.readthedocs.org/en/rtd/transfer/
RNNs tend to be very deep
• II. LSTM
• Differences of LSTM and Vanilla RNN
• II. LSTM
• Core Idea Behind LSTMs
Cell state Gates
• II. LSTM
• Step-by-Step Walk Through
0 ~ 1
0 ~ 1
• II. LSTM
• Step-by-Step Walk Through
0 ~ 1
0 ~ 1
• III. GRU and other structures
• Gated Recurrent Unit (GRU)
• Combines the forget and input gates into a single “update gate.”
• Merges the cell state and hidden state
• Other changes
• III. GRU and other structures
• Variants on Long Short Term Memory
Greff, et al. (2015) do a nice comparison of popular variants,
finding that they’re all about the same.
Bibliography
• [1] Understanding LSTM Networks
• [2] Back Propagation Through Time and Vanishing Gradients
Thanks for listening!

Understanding RNN and LSTM

  • 1.
    RNN and LSTM (Oct12, 2016) YANG Jiancheng
  • 2.
    Outline • I. VanillaRNN • II. LSTM • III. GRU and Other Structures
  • 3.
    • I. VanillaRNN In theory, RNNs are absolutely capable of handling such “long- term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them. GREAT Intro: Understanding LSTM Networks
  • 4.
    • I. VanillaRNN WILDML has a series of articles to introduce RNN (4 articles, 2 GitHub repos).
  • 5.
    • I. VanillaRNN • Back Prop Through Time (BPTT)
  • 6.
    • I. VanillaRNN • Back Prop Through Time (BPTT)
  • 7.
    • I. VanillaRNN • Gradient Vanishing Problem tanh and derivative. Source: http://nn.readthedocs.org/en/rtd/transfer/ RNNs tend to be very deep
  • 8.
    • II. LSTM •Differences of LSTM and Vanilla RNN
  • 9.
    • II. LSTM •Core Idea Behind LSTMs Cell state Gates
  • 10.
    • II. LSTM •Step-by-Step Walk Through 0 ~ 1 0 ~ 1
  • 11.
    • II. LSTM •Step-by-Step Walk Through 0 ~ 1 0 ~ 1
  • 12.
    • III. GRUand other structures • Gated Recurrent Unit (GRU) • Combines the forget and input gates into a single “update gate.” • Merges the cell state and hidden state • Other changes
  • 13.
    • III. GRUand other structures • Variants on Long Short Term Memory Greff, et al. (2015) do a nice comparison of popular variants, finding that they’re all about the same.
  • 14.
    Bibliography • [1] UnderstandingLSTM Networks • [2] Back Propagation Through Time and Vanishing Gradients
  • 15.