RNN & LSTM
Neural Network for Sequential Data
- Jeff Hu -
Machine Learning Categories
• Supervised
• Unsupervised
• Reinforcement Learning
Supervised Machine Learning
• Training Set: Inputs + Outputs
• Learn a link between the inputs and the outputs
• Linear and logistic regression
• Support vector machine
• K-nearest neighbors (k-NN)
• Naive Bayes
• Neural network
• Gradient boosting
• Classification trees and random forest
Unsupervised Machine Learning
• Training Set: Inputs
• Cluster the inputs
• K-means
• Hierarchical clustering
• Mixture models
• PCA
• ICA
• Auto-encoder
Reinforcement Learning
• Training Set: N/A
• Find the best way to earn the greatest reward
• Utility learning
• Q-learning
K-Nearest Neighbor
Neural Network > Machine Learning ?
• Consider hidden relationships between features!
Recurrent Neural Network (RNN)
Benefits
• Deal with sequential information
• Perform the same task for every element of a sequence
• Has memory
• Can be unrolled like a chain
Application
• Language Modeling and Generating Text
• Machine Translation
• Speech Recognition
• Generating Image Descriptions
Training
• Backpropagation Through Time (BPTT)
Variations
• Bidirectional RNNs
• Deep (Bidirectional) RNNs
• LSTM networks
Long Short Term Memory Network
(LSTM)
Memory Problem of RNN
• Sometimes we need more context
• RNN is unable to connect the information further in the past
Benefits of LSTM
• Can learn long-term dependencies
Difference between RNN & LSTM
• RNN: single layer (tanh)
• LSTM: four interactive layers
Cell state
• The conveyor belt
Gates (3 in total for LSTM)
• A way that let information through
• E.g. A sigmoid neural net layer & a pointwise multiplication operation
Optional Math – Sigmoid function
Step 1: Forget Gate Layer
• Decide what info to throw away
• Look at h[t-1] and x[t] and output a number 0~1 to decide how much cell state to keep C[t-1]
• E.g. When see a new subject, we want to forget the gender of the old subject
Step 2: Input Gate Layer
• Decide what info to add
• A sigmoid: decide which value to update
• A tanh layer: create a new candidate value C~[t]
• E.g. add a new gender of the new subject
Step 3: Combine step 1 & 2
• Combine step 1 & 2
• Multiply the old state by f[t]: to forget the things
• Add i[t] * C~[t] : to add new candidate value (scaled)
Step 4: Filter/output the Cell state
• Decide what to output
• sigmoid: decide which part to output
• tanh: push the value to be between -1 ~ 1
• Multiply them to only output the part we decided to
• E.g. output a info related to a Verb
• E.g. output whether the subject it singular or plural
Step 4: Filter/output the Cell state
• Decide what to output
Variants on LSTM (1)
• Peephole:
 let the gate layer look at the cell state (entire/ partial)
Variants on LSTM (2)
• Coupled forgot and input gates:
 Not deciding separately
 f[t] * C[t-1] + (1-f[t]) * C~[t]
Variants on LSTM (3)
• Gated Recurrent Unit (GRU):
 combine the forget and input layer into a single “update gate”
 merge the cell state and the hidden state
 simpler and popular
RNN / LSTM Effectiveness
Multiple types of RNN use cases
Turing-Complete
• Running a fixed program with certain inputs and some internal variables (can simulate
arbitrary programs)
• Andrej Karpathy (Ph.D. @ Stanford):
Non-sequential data
• Though the data is not in form of sequences, we can still use RNN by process
it sequentially.
Some cool RNN/LSTM applications
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Great references
• [1] RNN: http://www.wildml.com/2015/09/recurrent-neural-networks-
tutorial-part-1-introduction-to-rnns/?subscribe=success#blog_subscription-2
• [2] LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• [3] RNN Effectiveness: http://karpathy.github.io/2015/05/21/rnn-
effectiveness/
• [4] Backpropagation: http://cs231n.github.io/optimization-2/#backprop
• [5] ML categories: http://enhancedatascience.com/2017/07/19/machine-
learning-explained-supervised-learning-unsupervised-learning-and-
reinforcement-learning/

RNN & LSTM: Neural Network for Sequential Data

  • 1.
    RNN & LSTM NeuralNetwork for Sequential Data - Jeff Hu -
  • 2.
    Machine Learning Categories •Supervised • Unsupervised • Reinforcement Learning
  • 3.
    Supervised Machine Learning •Training Set: Inputs + Outputs • Learn a link between the inputs and the outputs • Linear and logistic regression • Support vector machine • K-nearest neighbors (k-NN) • Naive Bayes • Neural network • Gradient boosting • Classification trees and random forest
  • 4.
    Unsupervised Machine Learning •Training Set: Inputs • Cluster the inputs • K-means • Hierarchical clustering • Mixture models • PCA • ICA • Auto-encoder
  • 5.
    Reinforcement Learning • TrainingSet: N/A • Find the best way to earn the greatest reward • Utility learning • Q-learning
  • 6.
  • 7.
    Neural Network >Machine Learning ? • Consider hidden relationships between features!
  • 8.
  • 10.
    Benefits • Deal withsequential information • Perform the same task for every element of a sequence • Has memory • Can be unrolled like a chain
  • 11.
    Application • Language Modelingand Generating Text • Machine Translation • Speech Recognition • Generating Image Descriptions
  • 12.
  • 13.
    Variations • Bidirectional RNNs •Deep (Bidirectional) RNNs • LSTM networks
  • 14.
    Long Short TermMemory Network (LSTM)
  • 16.
    Memory Problem ofRNN • Sometimes we need more context • RNN is unable to connect the information further in the past
  • 17.
    Benefits of LSTM •Can learn long-term dependencies
  • 18.
    Difference between RNN& LSTM • RNN: single layer (tanh) • LSTM: four interactive layers
  • 19.
    Cell state • Theconveyor belt
  • 20.
    Gates (3 intotal for LSTM) • A way that let information through • E.g. A sigmoid neural net layer & a pointwise multiplication operation
  • 21.
    Optional Math –Sigmoid function
  • 22.
    Step 1: ForgetGate Layer • Decide what info to throw away • Look at h[t-1] and x[t] and output a number 0~1 to decide how much cell state to keep C[t-1] • E.g. When see a new subject, we want to forget the gender of the old subject
  • 23.
    Step 2: InputGate Layer • Decide what info to add • A sigmoid: decide which value to update • A tanh layer: create a new candidate value C~[t] • E.g. add a new gender of the new subject
  • 24.
    Step 3: Combinestep 1 & 2 • Combine step 1 & 2 • Multiply the old state by f[t]: to forget the things • Add i[t] * C~[t] : to add new candidate value (scaled)
  • 25.
    Step 4: Filter/outputthe Cell state • Decide what to output • sigmoid: decide which part to output • tanh: push the value to be between -1 ~ 1 • Multiply them to only output the part we decided to • E.g. output a info related to a Verb • E.g. output whether the subject it singular or plural
  • 26.
    Step 4: Filter/outputthe Cell state • Decide what to output
  • 27.
    Variants on LSTM(1) • Peephole:  let the gate layer look at the cell state (entire/ partial)
  • 28.
    Variants on LSTM(2) • Coupled forgot and input gates:  Not deciding separately  f[t] * C[t-1] + (1-f[t]) * C~[t]
  • 29.
    Variants on LSTM(3) • Gated Recurrent Unit (GRU):  combine the forget and input layer into a single “update gate”  merge the cell state and the hidden state  simpler and popular
  • 30.
    RNN / LSTMEffectiveness
  • 31.
    Multiple types ofRNN use cases
  • 32.
    Turing-Complete • Running afixed program with certain inputs and some internal variables (can simulate arbitrary programs) • Andrej Karpathy (Ph.D. @ Stanford):
  • 33.
    Non-sequential data • Thoughthe data is not in form of sequences, we can still use RNN by process it sequentially.
  • 34.
    Some cool RNN/LSTMapplications • http://karpathy.github.io/2015/05/21/rnn-effectiveness/
  • 35.
    Great references • [1]RNN: http://www.wildml.com/2015/09/recurrent-neural-networks- tutorial-part-1-introduction-to-rnns/?subscribe=success#blog_subscription-2 • [2] LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • [3] RNN Effectiveness: http://karpathy.github.io/2015/05/21/rnn- effectiveness/ • [4] Backpropagation: http://cs231n.github.io/optimization-2/#backprop • [5] ML categories: http://enhancedatascience.com/2017/07/19/machine- learning-explained-supervised-learning-unsupervised-learning-and- reinforcement-learning/