4. RNN Language Model (RNNLM)
Sentence: An apple a day keeps the doctor away.
<START>X0
Y0
An
X1
Y1
X2
Y2
XT-1
YT-1
An apple doctor XT
YT
apple a away
away
<END>
5. How to train an RNN language model?
1. Pre-process training data
2. RNN model
a. Look up word embedding
b. Run RNN
c. Calculate the loss between the output and the target
d. Calculate gradients and update variables (Backpropagation)
An apple a day keeps the doctor away.
I like to eat fruits.
We are learning RNN language models.
34 12 39 10 2 44 98 11 39 45
34 9 88 11 78 34 45 45 45 45
34 98 72 35 5 17 29 45 45 45
6. Pre-process training data
1. Remove punctuation
2. Convert words to lowercase
3. <START>sentence<END>
4. Make sentences to the same length (padding or cutting)
5. Build vocabulary (choose most common words + <UNK>)
6. Map words to IDs (map the word that is not in the vocabulary to <UNK>’s ID)
We are learning RNN language models.
We are learning RNN language models
we are learning rnn language models
<START> we are learning rnn language models <END>
<START> we are learning rnn language models <END> <END> <END>
0 <UNK>
1 <START>
2 <END>
3 models
4 are
5 we
6 language
7 learning
8 rnn
1 5 4 7 8 6 3 2 2 2
7. Class PTBModel():
Parameters setting
Define RNN
Inputs look up
word embedding
Calculate the loss
between the output and
the target
Calculate gradients and
update variables
(Backpropagation)
Run RNN
10. Define RNN
X
attn_cell
Y
attn_cell
cell
● Define basic layer: BasicLSTMCell(size,...)
● Stack multiple layers:
tf.contrib.rnn.MultiRNNCell(...)
● Remember to Initialize the RNN state !
E.g. cell.zero_state(...)
11. Inputs look up word embedding
● embedding is a tensor of shape [vocabulary_size, embedding_size]
● tf.nn.embedding_lookup(...):
The word IDs are embedded into vector representations.
X = input
embedding_size =3
Y
12. Run RNN
● Each step:
○ cell_output
○ state
● After num_steps, concatenate all the cell_ouput (outputs)
input
Y
cell state
cell_output
13. Calculate the loss between the output and the target
We are learning RNN language models.
input = language
Y = logits
Input_.target
P(language | input)
P(models | input)
[0 0 0 1 0 0 0 0 0 0]
We are learning RNN language models.
models
14. Calculate gradients and update variables
(Backpropagation)
● Set initial learning rate
● Remember to clip gradients in RNN!
○ RNNs suffer from vanishing gradient / exploding gradient
○ tf.clip_by_global_norm(...)
○ Empirically confine gradients in (-1, 1) or (-5, 5)
15. Calculate gradients and update variables
(Backpropagation)
● optimizer: GradientDescent, Adam (most commonly used), RMSProp (used in GAN) ...
● train_op = optimizer.apply_gradients(...)
● Call sess.run(train_op), and then the whole training procedure starts!
17. Run Code
$ git clone https://github.com/tensorflow/models.git
$ cd models/tutorials/rnn/ptb
$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
$ tar zxvf simple-examples.tgz
$ python ptb_word_lm.py --data_path=./simple-examples/data/
Tensorflow official tutorial
about recurrent neural networks:
https://www.tensorflow.org/tutorials/recurrent
18. Experience in training recurrent neural networks
1. Try the simplest model first e.g. use only one layer, Adam optimizer...
2. When the validation loss goes flat
a. Decrease the learning rate -> Train more epochs
b. Change your model
i. Delay inputs
ii. Increase the size of hidden layer
iii. Increase the number of hidden layers
iv. Try more powerful models e.g. Bidirectional LSTM
3. Sometimes non-deep learning methods can get stable and good enough
performances.