Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RNN, LSTM and Seq-2-Seq Models

7,036 views

Published on

Presented by Jayeol Chun and Sang-Hyun Eun
June 9, 2016.

Published in: Technology
  • Be the first to comment

RNN, LSTM and Seq-2-Seq Models

  1. 1. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 1/11 Introductory Presentation on RNN, LSTM and Seq-2-Seq Models by Jayeol Chun and Sang-Hyun Eun 1. Brief Overview of Theory behind RNN Q: What is RNN?
  2. 2. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 2/11 Feed-Forward vs. Feed-Back : Static vs. Dynamic As opposed to Convolution Neural Network (CNN) where there are no cycles, Recurrent Neural Network (RNN) maintains the Persistence of Information by linking the outputs of previous computations to the later computations, and is thus well suited for processing sequence of characters, naturally making it an ideal tool in NLP. Basic RNN Computation in Theory class RNN: # ... def step(self, x): # update the hidden state self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x)) # compute the output vector y = np.dot(self.W_hy, self.h) return y # main instruction rnn = RNN() y = rnn.step(x) # x is an input vector, y is the RNN's output vector ~ Point to Take Away : Quite Simple !! Challenge Unstable Gradient Problem -> "The gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers." In at least some deep neural networks, the gradient tends to get smaller as we move backward through the hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers. Question : The more the hidden layers, the better ??
  3. 3. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 3/11 "Backpropagation computes gradients by chain rule -> This has the effect of multiplying n of these small numbers to compute gradients of the front layers in an n-layer network, meaning that the gradient (error signal) decreases exponentially with n and the front layers train very slowly." 2. Long Short Term Memory Network (LSTM)
  4. 4. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 4/11 The most commonly used type of RNN that addresses the above challenge Can learn to recognize context-sensitive languages Key is the cell state. It runs down the straight down the entire chain, with only minor linear interactions It updates its information with a structure called gates There are the 3 main types of gates Forget Gate Layer - Sigmoid layer and chooses what information to forget. Input Gate Layer - Choose what values to update and whats values to add Output Gate Layer - Based on our cell state, filter it to decide which values we want to output 3. Sequence to Sequence Model
  5. 5. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 5/11 Seq-2-Seq Model consists of two RNNs : an encoder that processes the input and maps it to a vector, and a decoder that generates the output sequence of symbols from the vector representation. Specifically, the encoder maps a variable-length source sequence to a fixed-length vector, and the decoder maps the vector representation back to a variable-length target sequence of symbols. The two networks are trained jointly to maximize the conditional probability of the target sequence given a source sequence. Each box in the picture above represents a cell of the RNN.
  6. 6. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 6/11 Example Sample RNN / Seq-2-Seq Code
  7. 7. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 7/11 In [1]: import tensorflow as tf from tensorflow.models.rnn import rnn, rnn_cell import numpy as np char_rdic = ['h','e','l','o'] # id -> char char_dic = {w: i for i, w in enumerate(char_rdic)} # char -> id sample = [char_dic[c] for c in "hello"] # to index x_data = np.array([ [1,0,0,0], # h [0,1,0,0], # e [0,0,1,0], # l [0,0,1,0]], # l dtype='f') # Configuration char_vocab_size = len(char_dic) rnn_size = 4 #char_vocab_size # 1 hot coding (one of 4) time_step_size = 4 # 'hell' -> predict 'ello' batch_size = 1 # one sample # RNN model rnn_cell = rnn_cell.BasicRNNCell(rnn_size) state = tf.zeros([batch_size, rnn_cell.state_size]) X_split = tf.split(0, time_step_size, x_data) outputs, state = tf.nn.seq2seq.rnn_decoder ( X_split, state, rnn_cell) print (state) print (outputs) # logits: list of 2D Tensors of shape [batch_size x num_decoder_symbol s]. # targets: list of 1D batch-sized int32 Tensors of the same length as lo gits. # weights: list of 1D batch-sized float-Tensors of the same length as lo gits. logits = tf.reshape(tf.concat(1, outputs), [-1, rnn_size]) targets = tf.reshape(sample[1:], [-1]) weights = tf.ones([time_step_size * batch_size]) loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [targets], [weig hts]) cost = tf.reduce_sum(loss) / batch_size train_op = tf.train.RMSPropOptimizer(0.01, 0.9).minimize(cost) # Launch the graph in a session with tf.Session() as sess: # you need to initialize all variables tf.initialize_all_variables().run() for i in range(100): sess.run(train_op) result = sess.run(tf.arg_max(logits, 1)) print (result, [char_rdic[t] for t in result])
  8. 8. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 8/11
  9. 9. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 9/11 Tensor("rnn_decoder/BasicRNNCell_3/Tanh:0", shape=(1, 4), dtype=float3 2) [<tf.Tensor 'rnn_decoder/BasicRNNCell/Tanh:0' shape=(1, 4) dtype=float3 2>, <tf.Tensor 'rnn_decoder/BasicRNNCell_1/Tanh:0' shape=(1, 4) dtype=f loat32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_2/Tanh:0' shape=(1, 4) dt ype=float32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_3/Tanh:0' shape=(1, 4) dtype=float32>] (array([2, 0, 0, 0]), ['l', 'h', 'h', 'h']) (array([2, 0, 0, 0]), ['l', 'h', 'h', 'h']) (array([2, 0, 3, 0]), ['l', 'h', 'o', 'h']) (array([2, 0, 3, 0]), ['l', 'h', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 2, 2]), ['l', 'l', 'l', 'l']) (array([2, 2, 2, 2]), ['l', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
  10. 10. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 10/11 (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
  11. 11. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 11/11 4. RNN Applications - Language Modeling - Conversation Modeling / Question Answering - Machine Translation - Speech Recognition - Image / Video Captioning - Image / Music Generation 5. References - http://colah.github.io/posts/2015-08-Understanding-LSTMs/ - https://en.wikipedia.org/wiki/Convolutional_neural_network - https://en.wikipedia.org/wiki/Recurrent_neural_network - http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-in troduction-to-rnns/ - https://github.com/hunkim/ml - https://www.tensorflow.org/versions/r0.9/tutorials/index.html - http://karpathy.github.io/2015/05/21/rnn-effectiveness/ - http://arxiv.org/pdf/1409.3215v3.pdf - http://arxiv.org/pdf/1406.1078v3.pdf Code References: - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ker nel_tests/rnn_test.py - https://github.com/hans/ipython-notebooks/blob/master/tf/TF%20tutorial.ipy nb - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rn n/ptb/ptb_word_lm.py - https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py

×