RNN Language Model
in TensorFlow
Chia-Wen Cheng
2017.03.27
Recurrent Neural Networks
Neural Network
Xt
Yt
Ct
input
output
Cell state
Neural Network
Xt
Yt
Ct
Neural Network
X0
Y0
C0
=
An unrolled recurrent neural network
Neural Network
X1
Y1
C1
Neural Network
XT
YT
CT
...
RNN Language Model (RNNLM)
Sentence: An apple a day keeps the doctor away.
<START>X0
Y0
An
X1
Y1
X2
Y2
XT-1
YT-1
An apple doctor XT
YT
apple a away
away
<END>
How to train an RNN language model?
1. Pre-process training data
2. RNN model
a. Look up word embedding
b. Run RNN
c. Calculate the loss between the output and the target
d. Calculate gradients and update variables (Backpropagation)
An apple a day keeps the doctor away.
I like to eat fruits.
We are learning RNN language models.
34 12 39 10 2 44 98 11 39 45
34 9 88 11 78 34 45 45 45 45
34 98 72 35 5 17 29 45 45 45
Pre-process training data
1. Remove punctuation
2. Convert words to lowercase
3. <START>sentence<END>
4. Make sentences to the same length (padding or cutting)
5. Build vocabulary (choose most common words + <UNK>)
6. Map words to IDs (map the word that is not in the vocabulary to <UNK>’s ID)
We are learning RNN language models.
We are learning RNN language models
we are learning rnn language models
<START> we are learning rnn language models <END>
<START> we are learning rnn language models <END> <END> <END>
0 <UNK>
1 <START>
2 <END>
3 models
4 are
5 we
6 language
7 learning
8 rnn
1 5 4 7 8 6 3 2 2 2
Class PTBModel():
Parameters setting
Define RNN
Inputs look up
word embedding
Calculate the loss
between the output and
the target
Calculate gradients and
update variables
(Backpropagation)
Run RNN
Parameters setting
X0 X1 X2 X9
Parameters setting
Vocabulary size = 10
RNN hidden size = 5
X0 X1 X2 X9
Num_step = 10
Perform backpropagation
after 10 steps
Define RNN
X
attn_cell
Y
attn_cell
cell
● Define basic layer: BasicLSTMCell(size,...)
● Stack multiple layers:
tf.contrib.rnn.MultiRNNCell(...)
● Remember to Initialize the RNN state !
E.g. cell.zero_state(...)
Inputs look up word embedding
● embedding is a tensor of shape [vocabulary_size, embedding_size]
● tf.nn.embedding_lookup(...):
The word IDs are embedded into vector representations.
X = input
embedding_size =3
Y
Run RNN
● Each step:
○ cell_output
○ state
● After num_steps, concatenate all the cell_ouput (outputs)
input
Y
cell state
cell_output
Calculate the loss between the output and the target
We are learning RNN language models.
input = language
Y = logits
Input_.target
P(language | input)
P(models | input)
[0 0 0 1 0 0 0 0 0 0]
We are learning RNN language models.
models
Calculate gradients and update variables
(Backpropagation)
● Set initial learning rate
● Remember to clip gradients in RNN!
○ RNNs suffer from vanishing gradient / exploding gradient
○ tf.clip_by_global_norm(...)
○ Empirically confine gradients in (-1, 1) or (-5, 5)
Calculate gradients and update variables
(Backpropagation)
● optimizer: GradientDescent, Adam (most commonly used), RMSProp (used in GAN) ...
● train_op = optimizer.apply_gradients(...)
● Call sess.run(train_op), and then the whole training procedure starts!
Vanishing/Exploding gradients
Source:http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropaga
tion-through-time-and-vanishing-gradients/
If you want to dive deeper into
RNN, you can watch Prof. Bengio’s
video lecture:
http://videolectures.net/deeplearnin
g2016_bengio_neural_networks/
Run Code
$ git clone https://github.com/tensorflow/models.git
$ cd models/tutorials/rnn/ptb
$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
$ tar zxvf simple-examples.tgz
$ python ptb_word_lm.py --data_path=./simple-examples/data/
Tensorflow official tutorial
about recurrent neural networks:
https://www.tensorflow.org/tutorials/recurrent
Experience in training recurrent neural networks
1. Try the simplest model first e.g. use only one layer, Adam optimizer...
2. When the validation loss goes flat
a. Decrease the learning rate -> Train more epochs
b. Change your model
i. Delay inputs
ii. Increase the size of hidden layer
iii. Increase the number of hidden layers
iv. Try more powerful models e.g. Bidirectional LSTM
3. Sometimes non-deep learning methods can get stable and good enough
performances.

RNNLM in TensorFlow

  • 1.
    RNN Language Model inTensorFlow Chia-Wen Cheng 2017.03.27
  • 2.
    Recurrent Neural Networks NeuralNetwork Xt Yt Ct input output Cell state
  • 3.
    Neural Network Xt Yt Ct Neural Network X0 Y0 C0 = Anunrolled recurrent neural network Neural Network X1 Y1 C1 Neural Network XT YT CT ...
  • 4.
    RNN Language Model(RNNLM) Sentence: An apple a day keeps the doctor away. <START>X0 Y0 An X1 Y1 X2 Y2 XT-1 YT-1 An apple doctor XT YT apple a away away <END>
  • 5.
    How to trainan RNN language model? 1. Pre-process training data 2. RNN model a. Look up word embedding b. Run RNN c. Calculate the loss between the output and the target d. Calculate gradients and update variables (Backpropagation) An apple a day keeps the doctor away. I like to eat fruits. We are learning RNN language models. 34 12 39 10 2 44 98 11 39 45 34 9 88 11 78 34 45 45 45 45 34 98 72 35 5 17 29 45 45 45
  • 6.
    Pre-process training data 1.Remove punctuation 2. Convert words to lowercase 3. <START>sentence<END> 4. Make sentences to the same length (padding or cutting) 5. Build vocabulary (choose most common words + <UNK>) 6. Map words to IDs (map the word that is not in the vocabulary to <UNK>’s ID) We are learning RNN language models. We are learning RNN language models we are learning rnn language models <START> we are learning rnn language models <END> <START> we are learning rnn language models <END> <END> <END> 0 <UNK> 1 <START> 2 <END> 3 models 4 are 5 we 6 language 7 learning 8 rnn 1 5 4 7 8 6 3 2 2 2
  • 7.
    Class PTBModel(): Parameters setting DefineRNN Inputs look up word embedding Calculate the loss between the output and the target Calculate gradients and update variables (Backpropagation) Run RNN
  • 8.
  • 9.
    Parameters setting Vocabulary size= 10 RNN hidden size = 5 X0 X1 X2 X9 Num_step = 10 Perform backpropagation after 10 steps
  • 10.
    Define RNN X attn_cell Y attn_cell cell ● Definebasic layer: BasicLSTMCell(size,...) ● Stack multiple layers: tf.contrib.rnn.MultiRNNCell(...) ● Remember to Initialize the RNN state ! E.g. cell.zero_state(...)
  • 11.
    Inputs look upword embedding ● embedding is a tensor of shape [vocabulary_size, embedding_size] ● tf.nn.embedding_lookup(...): The word IDs are embedded into vector representations. X = input embedding_size =3 Y
  • 12.
    Run RNN ● Eachstep: ○ cell_output ○ state ● After num_steps, concatenate all the cell_ouput (outputs) input Y cell state cell_output
  • 13.
    Calculate the lossbetween the output and the target We are learning RNN language models. input = language Y = logits Input_.target P(language | input) P(models | input) [0 0 0 1 0 0 0 0 0 0] We are learning RNN language models. models
  • 14.
    Calculate gradients andupdate variables (Backpropagation) ● Set initial learning rate ● Remember to clip gradients in RNN! ○ RNNs suffer from vanishing gradient / exploding gradient ○ tf.clip_by_global_norm(...) ○ Empirically confine gradients in (-1, 1) or (-5, 5)
  • 15.
    Calculate gradients andupdate variables (Backpropagation) ● optimizer: GradientDescent, Adam (most commonly used), RMSProp (used in GAN) ... ● train_op = optimizer.apply_gradients(...) ● Call sess.run(train_op), and then the whole training procedure starts!
  • 16.
    Vanishing/Exploding gradients Source:http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropaga tion-through-time-and-vanishing-gradients/ If youwant to dive deeper into RNN, you can watch Prof. Bengio’s video lecture: http://videolectures.net/deeplearnin g2016_bengio_neural_networks/
  • 17.
    Run Code $ gitclone https://github.com/tensorflow/models.git $ cd models/tutorials/rnn/ptb $ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz $ tar zxvf simple-examples.tgz $ python ptb_word_lm.py --data_path=./simple-examples/data/ Tensorflow official tutorial about recurrent neural networks: https://www.tensorflow.org/tutorials/recurrent
  • 18.
    Experience in trainingrecurrent neural networks 1. Try the simplest model first e.g. use only one layer, Adam optimizer... 2. When the validation loss goes flat a. Decrease the learning rate -> Train more epochs b. Change your model i. Delay inputs ii. Increase the size of hidden layer iii. Increase the number of hidden layers iv. Try more powerful models e.g. Bidirectional LSTM 3. Sometimes non-deep learning methods can get stable and good enough performances.