SlideShare a Scribd company logo
1 of 18
Download to read offline
RNN Language Model
in TensorFlow
Chia-Wen Cheng
2017.03.27
Recurrent Neural Networks
Neural Network
Xt
Yt
Ct
input
output
Cell state
Neural Network
Xt
Yt
Ct
Neural Network
X0
Y0
C0
=
An unrolled recurrent neural network
Neural Network
X1
Y1
C1
Neural Network
XT
YT
CT
...
RNN Language Model (RNNLM)
Sentence: An apple a day keeps the doctor away.
<START>X0
Y0
An
X1
Y1
X2
Y2
XT-1
YT-1
An apple doctor XT
YT
apple a away
away
<END>
How to train an RNN language model?
1. Pre-process training data
2. RNN model
a. Look up word embedding
b. Run RNN
c. Calculate the loss between the output and the target
d. Calculate gradients and update variables (Backpropagation)
An apple a day keeps the doctor away.
I like to eat fruits.
We are learning RNN language models.
34 12 39 10 2 44 98 11 39 45
34 9 88 11 78 34 45 45 45 45
34 98 72 35 5 17 29 45 45 45
Pre-process training data
1. Remove punctuation
2. Convert words to lowercase
3. <START>sentence<END>
4. Make sentences to the same length (padding or cutting)
5. Build vocabulary (choose most common words + <UNK>)
6. Map words to IDs (map the word that is not in the vocabulary to <UNK>’s ID)
We are learning RNN language models.
We are learning RNN language models
we are learning rnn language models
<START> we are learning rnn language models <END>
<START> we are learning rnn language models <END> <END> <END>
0 <UNK>
1 <START>
2 <END>
3 models
4 are
5 we
6 language
7 learning
8 rnn
1 5 4 7 8 6 3 2 2 2
Class PTBModel():
Parameters setting
Define RNN
Inputs look up
word embedding
Calculate the loss
between the output and
the target
Calculate gradients and
update variables
(Backpropagation)
Run RNN
Parameters setting
X0 X1 X2 X9
Parameters setting
Vocabulary size = 10
RNN hidden size = 5
X0 X1 X2 X9
Num_step = 10
Perform backpropagation
after 10 steps
Define RNN
X
attn_cell
Y
attn_cell
cell
● Define basic layer: BasicLSTMCell(size,...)
● Stack multiple layers:
tf.contrib.rnn.MultiRNNCell(...)
● Remember to Initialize the RNN state !
E.g. cell.zero_state(...)
Inputs look up word embedding
● embedding is a tensor of shape [vocabulary_size, embedding_size]
● tf.nn.embedding_lookup(...):
The word IDs are embedded into vector representations.
X = input
embedding_size =3
Y
Run RNN
● Each step:
○ cell_output
○ state
● After num_steps, concatenate all the cell_ouput (outputs)
input
Y
cell state
cell_output
Calculate the loss between the output and the target
We are learning RNN language models.
input = language
Y = logits
Input_.target
P(language | input)
P(models | input)
[0 0 0 1 0 0 0 0 0 0]
We are learning RNN language models.
models
Calculate gradients and update variables
(Backpropagation)
● Set initial learning rate
● Remember to clip gradients in RNN!
○ RNNs suffer from vanishing gradient / exploding gradient
○ tf.clip_by_global_norm(...)
○ Empirically confine gradients in (-1, 1) or (-5, 5)
Calculate gradients and update variables
(Backpropagation)
● optimizer: GradientDescent, Adam (most commonly used), RMSProp (used in GAN) ...
● train_op = optimizer.apply_gradients(...)
● Call sess.run(train_op), and then the whole training procedure starts!
Vanishing/Exploding gradients
Source:http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropaga
tion-through-time-and-vanishing-gradients/
If you want to dive deeper into
RNN, you can watch Prof. Bengio’s
video lecture:
http://videolectures.net/deeplearnin
g2016_bengio_neural_networks/
Run Code
$ git clone https://github.com/tensorflow/models.git
$ cd models/tutorials/rnn/ptb
$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
$ tar zxvf simple-examples.tgz
$ python ptb_word_lm.py --data_path=./simple-examples/data/
Tensorflow official tutorial
about recurrent neural networks:
https://www.tensorflow.org/tutorials/recurrent
Experience in training recurrent neural networks
1. Try the simplest model first e.g. use only one layer, Adam optimizer...
2. When the validation loss goes flat
a. Decrease the learning rate -> Train more epochs
b. Change your model
i. Delay inputs
ii. Increase the size of hidden layer
iii. Increase the number of hidden layers
iv. Try more powerful models e.g. Bidirectional LSTM
3. Sometimes non-deep learning methods can get stable and good enough
performances.

More Related Content

Similar to RNNLM in TensorFlow

Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)
Abdullah al Mamun
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 

Similar to RNNLM in TensorFlow (20)

Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Introduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and TensorflowIntroduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and Tensorflow
 
H2 o berkeleydltf
H2 o berkeleydltfH2 o berkeleydltf
H2 o berkeleydltf
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
TensorFlow in Your Browser
TensorFlow in Your BrowserTensorFlow in Your Browser
TensorFlow in Your Browser
 
Introduction to Deep Learning and TensorFlow
Introduction to Deep Learning and TensorFlowIntroduction to Deep Learning and TensorFlow
Introduction to Deep Learning and TensorFlow
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
Introductionof c
Introductionof cIntroductionof c
Introductionof c
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Deep Learning in Your Browser
Deep Learning in Your BrowserDeep Learning in Your Browser
Deep Learning in Your Browser
 
Intro to Deep Learning, TensorFlow, and tensorflow.js
Intro to Deep Learning, TensorFlow, and tensorflow.jsIntro to Deep Learning, TensorFlow, and tensorflow.js
Intro to Deep Learning, TensorFlow, and tensorflow.js
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Learning with classification and clustering, neural networks
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networks
 
Biopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and OutlookBiopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and Outlook
 
Dsp manual print
Dsp manual printDsp manual print
Dsp manual print
 
ML in Android
ML in AndroidML in Android
ML in Android
 
Introduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep LearningIntroduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep Learning
 

Recently uploaded

Recently uploaded (20)

BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 

RNNLM in TensorFlow

  • 1. RNN Language Model in TensorFlow Chia-Wen Cheng 2017.03.27
  • 2. Recurrent Neural Networks Neural Network Xt Yt Ct input output Cell state
  • 3. Neural Network Xt Yt Ct Neural Network X0 Y0 C0 = An unrolled recurrent neural network Neural Network X1 Y1 C1 Neural Network XT YT CT ...
  • 4. RNN Language Model (RNNLM) Sentence: An apple a day keeps the doctor away. <START>X0 Y0 An X1 Y1 X2 Y2 XT-1 YT-1 An apple doctor XT YT apple a away away <END>
  • 5. How to train an RNN language model? 1. Pre-process training data 2. RNN model a. Look up word embedding b. Run RNN c. Calculate the loss between the output and the target d. Calculate gradients and update variables (Backpropagation) An apple a day keeps the doctor away. I like to eat fruits. We are learning RNN language models. 34 12 39 10 2 44 98 11 39 45 34 9 88 11 78 34 45 45 45 45 34 98 72 35 5 17 29 45 45 45
  • 6. Pre-process training data 1. Remove punctuation 2. Convert words to lowercase 3. <START>sentence<END> 4. Make sentences to the same length (padding or cutting) 5. Build vocabulary (choose most common words + <UNK>) 6. Map words to IDs (map the word that is not in the vocabulary to <UNK>’s ID) We are learning RNN language models. We are learning RNN language models we are learning rnn language models <START> we are learning rnn language models <END> <START> we are learning rnn language models <END> <END> <END> 0 <UNK> 1 <START> 2 <END> 3 models 4 are 5 we 6 language 7 learning 8 rnn 1 5 4 7 8 6 3 2 2 2
  • 7. Class PTBModel(): Parameters setting Define RNN Inputs look up word embedding Calculate the loss between the output and the target Calculate gradients and update variables (Backpropagation) Run RNN
  • 9. Parameters setting Vocabulary size = 10 RNN hidden size = 5 X0 X1 X2 X9 Num_step = 10 Perform backpropagation after 10 steps
  • 10. Define RNN X attn_cell Y attn_cell cell ● Define basic layer: BasicLSTMCell(size,...) ● Stack multiple layers: tf.contrib.rnn.MultiRNNCell(...) ● Remember to Initialize the RNN state ! E.g. cell.zero_state(...)
  • 11. Inputs look up word embedding ● embedding is a tensor of shape [vocabulary_size, embedding_size] ● tf.nn.embedding_lookup(...): The word IDs are embedded into vector representations. X = input embedding_size =3 Y
  • 12. Run RNN ● Each step: ○ cell_output ○ state ● After num_steps, concatenate all the cell_ouput (outputs) input Y cell state cell_output
  • 13. Calculate the loss between the output and the target We are learning RNN language models. input = language Y = logits Input_.target P(language | input) P(models | input) [0 0 0 1 0 0 0 0 0 0] We are learning RNN language models. models
  • 14. Calculate gradients and update variables (Backpropagation) ● Set initial learning rate ● Remember to clip gradients in RNN! ○ RNNs suffer from vanishing gradient / exploding gradient ○ tf.clip_by_global_norm(...) ○ Empirically confine gradients in (-1, 1) or (-5, 5)
  • 15. Calculate gradients and update variables (Backpropagation) ● optimizer: GradientDescent, Adam (most commonly used), RMSProp (used in GAN) ... ● train_op = optimizer.apply_gradients(...) ● Call sess.run(train_op), and then the whole training procedure starts!
  • 16. Vanishing/Exploding gradients Source:http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropaga tion-through-time-and-vanishing-gradients/ If you want to dive deeper into RNN, you can watch Prof. Bengio’s video lecture: http://videolectures.net/deeplearnin g2016_bengio_neural_networks/
  • 17. Run Code $ git clone https://github.com/tensorflow/models.git $ cd models/tutorials/rnn/ptb $ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz $ tar zxvf simple-examples.tgz $ python ptb_word_lm.py --data_path=./simple-examples/data/ Tensorflow official tutorial about recurrent neural networks: https://www.tensorflow.org/tutorials/recurrent
  • 18. Experience in training recurrent neural networks 1. Try the simplest model first e.g. use only one layer, Adam optimizer... 2. When the validation loss goes flat a. Decrease the learning rate -> Train more epochs b. Change your model i. Delay inputs ii. Increase the size of hidden layer iii. Increase the number of hidden layers iv. Try more powerful models e.g. Bidirectional LSTM 3. Sometimes non-deep learning methods can get stable and good enough performances.