RNNLM in TensorFlow

RNN Language Model
in TensorFlow
Chia-Wen Cheng
2017.03.27

Recurrent Neural Networks
Neural Network
Xt
Yt
Ct
input
output
Cell state

Neural Network
Xt
Yt
Ct
Neural Network
X0
Y0
C0
=
An unrolled recurrent neural network
Neural Network
X1
Y1
C1
Neural Network
XT
YT
CT
...

RNN Language Model (RNNLM)
Sentence: An apple a day keeps the doctor away.
<START>X0
Y0
An
X1
Y1
X2
Y2
XT-1
YT-1
An apple doctor XT
YT
apple a away
away
<END>

How to train an RNN language model?
1. Pre-process training data
2. RNN model
a. Look up word embedding
b. Run RNN
c. Calculate the loss between the output and the target
d. Calculate gradients and update variables (Backpropagation)
An apple a day keeps the doctor away.
I like to eat fruits.
We are learning RNN language models.
34 12 39 10 2 44 98 11 39 45
34 9 88 11 78 34 45 45 45 45
34 98 72 35 5 17 29 45 45 45

Pre-process training data
1. Remove punctuation
2. Convert words to lowercase
3. <START>sentence<END>
4. Make sentences to the same length (padding or cutting)
5. Build vocabulary (choose most common words + <UNK>)
6. Map words to IDs (map the word that is not in the vocabulary to <UNK>’s ID)
We are learning RNN language models
we are learning rnn language models
<START> we are learning rnn language models <END>
<START> we are learning rnn language models <END> <END> <END>
0 <UNK>
1 <START>
2 <END>
3 models
4 are
5 we
6 language
7 learning
8 rnn
1 5 4 7 8 6 3 2 2 2

Class PTBModel():
Parameters setting
Define RNN
Inputs look up
word embedding
Calculate the loss
between the output and
the target
Calculate gradients and
update variables
(Backpropagation)
Run RNN

Parameters setting
X0 X1 X2 X9

Parameters setting
Vocabulary size = 10
RNN hidden size = 5
X0 X1 X2 X9
Num_step = 10
Perform backpropagation
after 10 steps

Define RNN
X
attn_cell
Y
attn_cell
cell
● Define basic layer: BasicLSTMCell(size,...)
● Stack multiple layers:
tf.contrib.rnn.MultiRNNCell(...)
● Remember to Initialize the RNN state !
E.g. cell.zero_state(...)

Inputs look up word embedding
● embedding is a tensor of shape [vocabulary_size, embedding_size]
● tf.nn.embedding_lookup(...):
The word IDs are embedded into vector representations.
X = input
embedding_size =3
Y

Run RNN
● Each step:
○ cell_output
○ state
● After num_steps, concatenate all the cell_ouput (outputs)
input
Y
cell state
cell_output

Calculate the loss between the output and the target
input = language
Y = logits
Input_.target
P(language | input)
P(models | input)
[0 0 0 1 0 0 0 0 0 0]
models

Calculate gradients and update variables
(Backpropagation)
● Set initial learning rate
● Remember to clip gradients in RNN!
○ RNNs suffer from vanishing gradient / exploding gradient
○ tf.clip_by_global_norm(...)
○ Empirically confine gradients in (-1, 1) or (-5, 5)

Calculate gradients and update variables
(Backpropagation)
● optimizer: GradientDescent, Adam (most commonly used), RMSProp (used in GAN) ...
● train_op = optimizer.apply_gradients(...)
● Call sess.run(train_op), and then the whole training procedure starts!

Vanishing/Exploding gradients
Source:http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropaga
tion-through-time-and-vanishing-gradients/
If you want to dive deeper into
RNN, you can watch Prof. Bengio’s
video lecture:
http://videolectures.net/deeplearnin
g2016_bengio_neural_networks/

Run Code
$ git clone https://github.com/tensorflow/models.git
$ cd models/tutorials/rnn/ptb
$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
$ tar zxvf simple-examples.tgz
$ python ptb_word_lm.py --data_path=./simple-examples/data/
Tensorflow official tutorial
about recurrent neural networks:
https://www.tensorflow.org/tutorials/recurrent

Experience in training recurrent neural networks
1. Try the simplest model first e.g. use only one layer, Adam optimizer...
2. When the validation loss goes flat
a. Decrease the learning rate -> Train more epochs
b. Change your model
i. Delay inputs
ii. Increase the size of hidden layer
iii. Increase the number of hidden layers
iv. Try more powerful models e.g. Bidirectional LSTM
3. Sometimes non-deep learning methods can get stable and good enough
performances.

RNNLM in TensorFlow

More Related Content

Similar to RNNLM in TensorFlow

Recently uploaded

RNNLM in TensorFlow