TensorFlow RNN Language Model

#Tensorflow @martin_gorner
deep
Science !
deep
Code ...
>TensorFlow, deep learning and
recurrent neural networks
without a PhD_
>TensorFlow, deep learning and
recurrent neural networks
without a PhD_

The superpower: batch normalisation

@martin_gorner
Data “whitening”
Data: large values, different scales, skewed, correlated

@martin_gorner
Modified data: centered around zero, rescaled...
Subtract average
Divide by std dev

@martin_gorner
(A+B)/2
A-B
Modified data: … and decorrelated (that was almost a Principal Component Analysis)

@martin_gorner
new
A
new
B
= A B
x
0.05 0.12
0.61 -1.23
+ -1.45 0.12
W ? B ?
A network layer
can do this !
Scale & rotate shift

@martin_gorner
Fully connected network
9
...
0 1 2
softmax
200
100
60
10
30
784
OK
OK ?
OK ???
OK ???
OK ???

@martin_gorner
Without batch normalisation
sigmoid
My distribution
of inputs
boo-hoo

@martin_gorner
Batch normalisation
Center and re-scale logits
before the activation function
(decorrelate ? no, too complex)
Compute average and
variance on mini-batch
Add learnable scale and offset
for each logit so as to restore expressiveness
“logit” = weighted sum + bias
one of each
per neuron
Try α=stdev(x) and β=avg(x) and you have BN(x) = x

@martin_gorner
Batch normalisation
depends from:
weights, biases, images
depends from:
same weights and biases, images
only one set of weights and biases in a mini-batch
=> BN is differentiable relatively to weights, biases, α and β
It can be used as a layer in the network, gradient calculations will still work
Batch-norm α, β
x =
weighted
sum + bias
activation
fn

@martin_gorner
With batch normalisation (sigmoid)
sigmoid
distribution of
neuron output
Batch norm

@martin_gorner
With batch normalisation (RELU)
RELU
My distribution
of inputs

@martin_gorner
Batch normalisation done right
Batch-norm α, β
x =
weighted
sum + b
activation
fn
biases :
no longer useful
when activation fn is RELU
α is not useful
It does not modify output distrib.
Per
neuron:
relu sigmoid
without
BN
bias bias
With
BN
β α, β
+You can go faster: use higher learning rate
+BN also regularises: lower or remove dropout

@martin_gorner
Convolutional batch normalisation
W1[4, 4, 3]
W2[4, 4, 3]
Each neuron or patch has a value:
● per image in the batch
● per x position
● per y position
=> compute avg and stdev across all
batchsize x width x height values
b1 α1 β1
b2 α2 β2
Still, one bias,
scale or offset
per neuron

@martin_gorner
Batch normalisation at test time
Stats on what ?
● Last batch: no
● all images: yes (but not practical)
● => Exponential moving average during training

@martin_gorner
Batch normalisation with Tensorflow
def batchnorm_layer(Ylogits, is_test, Offset, Scale, iteration, convolutional=False):
exp_moving_avg = tf.train.ExponentialMovingAverage(0.9999, iteration)
if convolutional: # avg across batch, width, height
mean, variance = tf.nn.moments(Ylogits, [0, 1, 2])
else:
mean, variance = tf.nn.moments(Ylogits, [0])
update_moving_averages = exp_moving_avg.apply([mean, variance])
m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean)
v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance)
Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, variance_epsilon=1e-5)
return Ybn, update_moving_averages
Define one offset and/or
scale per neuron
apply activation fn on Ybn
don’t forget to execute this (sess.run)
The code is on GitHub: goo.gl/DEOe7Z

@martin_gorner
More superpowers
high level API

@martin_gorner
Layers
from tensorflow.contrib import layers
# this
Y = layers.relu(X, 200)
# instead of this
W = tf.Variable(tf.zeros([784, 200]))
b = tf.Variable(tf.zeros([200]))
Y = tf.nn.relu(tf.matmul(X,W) + b)
Sample: goo.gl/y1SSFy

@martin_gorner
Model function
from tensorflow.contrib import learn, layers, metrics
def model_fn(X, Y_, mode):
Yn = … # model layers
prob = tf.nn.softmax(Yn)
digi = tf.argmax(prob, 1)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(…)
train = layers.optimize_loss(loss,framework.get_global_step(), 0.003,"Adam")
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets“
learning rate
TRAIN, EVAL
or INFER

@martin_gorner
Estimator
estimator = learn.Estimator(model_fn=model_fn)
estimator.fit(input_fn=… , steps=10000)
estimator.evaluate(input_fn=…, steps=1)
# => {'accuracy': … }
estimator.predict(input_fn=…)
# => {"probabilities":…, "digits":…}
# input_fn: feeds in batches of features and targets

@martin_gorner
Convolutional network
def conv_model(X, Y_, mode):
XX = tf.reshape(X, [-1, 28, 28, 1])
Y1 = layers.conv2d(XX, num_outputs=6, kernel_size=[6, 6])
Y2 = layers.conv2d(Y1, num_outputs=12, kernel_size=[5, 5], stride=2)
Y3 = layers.conv2d(Y2, num_outputs=24, kernel_size=[4, 4], stride=2)
Y4 = layers.flatten(Y3)
Y5 = layers.relu(Y4, 200)
Ylogits = layers.linear(Y5, 10)
prob = tf.nn.softmax(Ylogits)
digi = tf.cast(tf.argmax(prob, 1), tf.uint8)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, tf.one_hot(Y_, 10))
train = layers.optimize_loss(loss, framework.get_global_step(), 0.003, "Adam")
return learn.ModelFnOps(mode, predictions, loss, train, evaluations)
estimator = learn.Estimator(model_fn=conv_model)

@martin_gorner
RNN
softmax
tanh
X: inputs
Y: outputs
H: internal
state
RNN cell
H
Xt
Yt
N: internal size

@martin_gorner
RNN
X = Xt | Ht-1
Ht = tanh(X.WH + bH)
Yt = softmax(Ht.W + b)
concatenation
RNN cell
H
Xt
Yt

@martin_gorner
RNN training
H-1
cell
H0
Y0
X0
cell
H1
Y1
X1
cell
H2
Y2
X2
cell
H3
Y3
X3
cell
H4
Y4
X4
cell
H5
Y5
X5
The same weights and biases shared across iterations

@martin_gorner
Deep RNN
0
0
cell
H’0
Y0
cell
H0
X0
cell
H’1
Y1
cell
H1
X1
cell
H’2
Y2
cell
H2
X2
cell
H’3
Y3
cell
H3
X3
cell
H’4
Y4
cell
H4
X4
cell
H’5
Y5
cell
H5
X5
L: number of layers

@martin_gorner
Michel C. was born in Paris, France. He is married and has three children. He received a M.S.
in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987,
and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He
specialized in child and adolescent psychiatry and his first field of research was severe mood disorders
in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ?
Long term dependencies: a problem
Short context
English,
German,
Russian,
French …
Long context Problems…
Hn
…
Michel C. was born in
French
…
Hn-1

@martin_gorner
LSTM
LSTM = Long Short Term Memory
tanh
tanh
σ
Xt
Ht-1 Ht
Yt
Ct
Ct-1
concatenation
Element-wise operations
tanh
tanh Neural net. layers
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
× +
× ×
×
σ σ
σ

@martin_gorner
LSTM
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
tanh
tanh
σ
Xt
Ht-1 Ht
Yt
Ct-1
× +
× ×
σ σ
Ct
concatenate :
forget gate :
update gate :
result gate :
input :
new C :
new H :
output :
p+n
n
n
n
n
n
n
vector sizes
m

@martin_gorner
GRU
X = Xt | Ht-1
z = σ(X.Wz + bz)
r = σ(X.Wr + br)
X’ = Xt | r * Ht-1
X” = tanh(X’.Wc + bc)
Ht = (1-z) * Ht-1 + z * X”
p+n
n
n
p+n
n
n
vector sizes
m
GRU = Gated
Recurrent Unit
GRU Ht
Yt
Xt
Ht-1
2 gates instead
of 3 => cheaper
Ht

@martin_gorner
Language model in Tensorflow
0 H5
S t _ J o h
t _ J o h n
character-
based
Characters,
one-hot encoded

@martin_gorner
0
GRU H0
X0
H0
cells = [tf.nn.rnn_cell.GRUCell(CELLSIZE) for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False)
Hr, H = tf.nn.dynamic_rnn(mcell, X, initial_state=Hin)
GRU
0
H’0
H’0
GRU
0
H”0
H”0
GRU H1
X1
H0
GRU H’1
H’0
GRU H”1
H”1
GRU H2
X2
H0
GRU H’2
H’0
GRU H”2
H”2
GRU H3
X3
H0
GRU H’3
H’0
GRU H”3
H”3
GRU H5
X4
H0
GRU H’5
H’0
GRU H”5
H”5
GRU H6
X6
H0
GRU H’6
H’0
GRU H”6
H”6
GRU H7
X7
H0
GRU H’7
H’0
GRU H”7
H”7
GRU H8
X8
H0
GRU H’8
H’0
GRU H”8
H”8
H
Hin
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
defines weights and
biases internally

@martin_gorner
Softmax readout layer
# Hr
Hf = tf.reshape(Hr, [-1, CELLSIZE])
0
H0
X0
H0
0
H’0
H’0
0
H”0
H”0
H1
X1
H0 H’1
H’0 H”1
H”1
H2
X2
H0 H’2
H’0 H”2
H”2
H3
X3
H0 H’3
H’0 H”3
H”3
H5
X4
H0 H’5
H’0 H”5
H”5
H6
X6
H0 H’6
H’0 H”6
H”6
H7
X7
H0 H’7
H’0 H”7
H”7
H8
X8
H0 H’8
H’0 H”8
H”8
Tip: handle sequence
and batch elements
the same
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
[ BATCHSIZE, SEQLEN, CELLSIZE ]
[ BATCHSIZE x SEQLEN, CELLSIZE ]
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
Ylogits = tf.layers.dense(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
[ BATCHSIZE x SEQLEN, ALPHASIZE ]
[ BATCHSIZE x SEQLEN, ALPHASIZE ]

@martin_gorner
Inputs and outputs
0
H0
X0
H0
0
H’0
H’0
0
H”0
H1
X1
H0 H’1
H’0 H”1
H2
X2
H0 H’2
H’0 H”2
H3
X3
H0 H’3
H’0 H”3
H5
X4
H0 H’5
H’0 H”5
H6
X6
H0 H’6
H’0 H”6
H7
X7
H0 H’7
H’0 H”7
H8
X8
H0 H’8
H’0 H”8
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
S t _ A n d
t _ A n d
r e
r e w
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
H: [ BATCHSIZE,
CELLSIZE x NLAYERS ]

@martin_gorner
Placeholders, and the rest...
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None, CELLSIZE*NLAYERS])
# Y, loss, Hout = my_model(X, Y_, Hin)
predictions = tf.argmax(Y, 1)
predictions = tf.reshape(predictions, [batchsize, -1])
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
[ BATCHSIZE, CELLSIZE x NLAYERS ]
Y: [ BATCHSIZE x SEQLEN, ALPHASIZE ]
[ BATCHSIZE x SEQLEN ]

@martin_gorner
Bitchin’ batchin’
Ht
Ht-1
The quic
seventh
Mr. Herm
Batch 1
k brown
heaven o
ann Zapf
Ht+1
Batch 2
fox jump
f typogr
was the
Ht+
2
Batch 3
++
later
++++
later
start
for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN,
nb_epochs=10):

@martin_gorner
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(Xd, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Yd_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None,
CELLSIZE*NLAYERS])
# the model
cell = [tf.nn.rnn_cell.GRUCell(CELLSIZE)
for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.
MultiRNNCell([cell]*NLAYERS,state_is_tuple=False)
Hr,H = tf.nn.
dynamic_rnn(mcell, X,
initial_state=Hin)
# softmax output layer
Hf = tf.reshape(Hr, [-1, CELLSIZE])
Ylogits = layers.linear(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
Yp = tf.argmax(Y, 1)
Yp = tf.reshape(Yp, [batchsize, -1])
# loss and training step (optimizer)
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
# training loop
for epoch in range(20):
inH = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])
for x, y_ in utils.rnn_minibatch_sequencer(codetext,
BATCHSIZE, SEQLEN, nb_epochs=30):
dic = {X: x, Y_: y_, Hin:inH}
_,y,outH = sess.run([train_step,Yp,H,], feed_dict=dic)
inH = outH
The code is on GitHub:
github.com/martin-gorner/
tensorflow-rnn-shakespeare

@martin_gorner
ee o no nonnaoter s ee seih iae r t i r io i ro s
sierota tsohoreroneo rsa esia anehereeo hensh
rho etnrhhs iti saoitns t et rsearh tshseoeh ta
oirhroren e eaetetnesnareeeoaraihss nshtano eter
e oooaoaeee nonn is heh easren ieson httn nihensont
t e n a ooe oerhi neaeehteriseat tiet i i ntsh
orhi e ohhsiea e aht ohr er ra eeo oeeitrot
hethisesaaei o saeii straieiteoeresorh e ooeri
e ninesh sort a es h rs hattnteseato sonoanr sniaase
s rshninsasi na sntennn oti r etnsnrse oh n
r e tiathhnaeeano trrr hhohooon rrt eernre e rnoh
Shakespeare
0.03
epochs
C1

@martin_gorner
Shakespeare
II WERENI
Are I I wos the wheer boaer.
Tin thim mh cals sate bauut site tar oue tinl
an bsisonetoal yer an fimireeren.
L[IO SI Hns oret bsllssts aaau ton hete me toer
frurtor sheus aed trat
A faler bis tote oadt tou than male, tel mou ce
an cime. ais fauto ws cien whus yas. Ande fert te a
ut wond aal sinr be at saar
0.1
epochs
C3

@martin_gorner
BERENS Hall hat in she the hir meres.
Perstr in ame not of heard, me thin hild of shear and
ant on of mare. I lore wes lour.
DOCHES The chaster'd on not fenst
The laldoos more.
[Ixeln thrish]
And tho priines sith of hamdeling the san wind
Shakespeare
0.2
epochs
C5
Stage directions ?

@martin_gorner
KING LEAR Alas, I am not forsworn both to bod!
And let the firm I have to'st trainoured.
KING HENRY VIII I love not my father.
PORDIA He tash you will have it.
HENRY BLUTIUS Work, thou lovest my son here,
thy father's fath!
CLIOND Why, then, would say, the beasts are
Shakespeare
1
epoch
C6
Invented
names !

@martin_gorner
Shakespeare
30
epochs
TITUS ANDRONICUS
ACT I
SCENE III An ante-chamber. The COUNT's palace.
[Enter CLEOMENES, with the Lord SAY]
Chamberlain Let me see your worshing in my hands.
LUCETTA I am a sign of me, and sorrow sounds it.
B10

@martin_gorner
Shakespeare
30
epochs
And sorrow far into the stars of men,
Without a second tears to seek the best and
bed,
With a strange service, and the foul prince of
Rome
[Exeunt MARK ANTONY and LEPIDUS]
Well said, my lord,--
MENENIUS I do not say so.
Well, I will not have no better ways;
B10

@martin_gorner
diassts_= =tlns==eti.s=tessn_((
sie_s_nts_ens= dondtnenroe dnar taonte
srst anttntoilonttiteaen
detrtstinsenoaolsesnesoairt(
arssserleeeerltrdlesssoeeslslrlslie(e
drnnaleeretteaelreesioe niennoarens
dssnstssaorns sreeoeslrteasntotnnai(ar
dsopelntederlalesdanserl
lts(sitae(e)
Python code
0.03
epochs
A1

@martin_gorner
with
self.essors_sigeater(output_dits_allss,
self._train.
for sampated to than ubtexsormations.
expeddions = np.randim(natched_collection,
ranger, mang_ops, samplering)
def assestErrorume_gens(assignex) as
and(sampled_veases):
eved.
Python code
0.1
epochs
A2
Python
keywords

@martin_gorner
def testGiddenSelfBeShareMecress(self):
with self.test_session() as sess:
tat = tf.contrib.matrix.cast_column_variable([1, 1], [0, 1, 1], [1, 7]],
[[1, 1, 1]].file(file, line_state_will_file))
with self.test_session():
self.assertAllEqual(1, l.ex6)
self.assertEqual(output_graph_def is_output_tensors_op(
tf.pro_context_name.sqrt(sess)
def test_shape(self):
res = values=value_rns[0].eval())
def tempDimpleSeriesGredicsIothasedWouthAverageData(self):
self._testDirector(self):
self._test_inv3_size = 5
with tf.train.ConvolutioBailLors_startswith("save_dir_context.PutIsprint().eval())
return tf.contrib.learn.RUCISLCCS:
# Check the orfloating so that the nimesting object mumputable othersifier.
# dense_keys.tokens_prefix/statch_size of the input1 tensors.
@property
Python code
0.4
epochs
A3
Wrong
([])
nesting
Correct
use of
colons:
Hallucinated
function
names

@martin_gorner
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in [0.1, 2.0, 3.0]]
def __init__(self, expected):
return np.array([[0, 0, 0], [0, 0, 0]])
self.assertAllEqual(tf.placeholder(tf.float32, shape=(3, 3)),(shape, prior.pack(),
tf.float32))
for keys in tensor_list:
return np.array([[0, 0, 0]]).astype(np.float32)
# Check that we have both scalar tensor for being invalid to a vector of 1 indicating
# the total loss of the same shape as the shape of the tensor.
sharded_weights = [[0.0, 1.0]]
# Create the string op to apply gradient terms that also batch.
# The original any operation as a code when we should alw infer to the session case.
Python code
12
epochs
B10
Correct triple ([]) nesting
Recites
Apache
license
Tensorflow
tips!

@martin_gorner
...and more
Credit to Andrej
Karpathy’s blog:
The Unreasonable
Effectiveness of
Recurrent Neural
Networks

@martin_gorner
Tensorflow: save, restore
saver = tf.train.Saver(keep_checkpoint_every_n_hours=0.1, max_to_keep=5)
with tf.Session() as sess:
# ... training loop ...
saver.save(sess, 'file_' , global_step=iter)
=> Save variables in , the graph in
file_200 file_200.meta
resto = tf.train.import_meta_graph('file_200.meta')
resto.restore(sess, 'file_200')
=> Restore graph and variable values
Must name variables explicitly !!!
# when saving
X = tf.placeholder(tf.uint8, name='X')
Y = tf.nn.softmax(Ylogits, name='Y')
# when using restored graph
y,h = sess.run(['Y:0', 'H:0'],
feed_dict={'X:0': y} )

@martin_gorner
Shakespeare generation
resto = tf.train.import_meta_graph('shake_200.meta')
resto.restore(sess, 'shake_200')
# initial values
x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32)
for i in range(100000):
dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1}
y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic)
c = my_txtutils.sample_from_probabilities(y, topn=5)
x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
print(chr(my_txtutils.convert_to_ascii(c)), end="")
X
Ht
H’0
Y
Ht-1
One char
at a time

@martin_gorner
Tensorboard
summary_writer = tf.train.SummaryWriter("log/train_" + time)
loss_summary = tf.scalar_summary("batch_loss", loss)
# in training loop:
smm = sess.run(summaries, feed_dict=dic)
summary_writer.add_summary(smm, iteration)
Tip: use time in
logdir name
Tip: use a second
SummaryWriter for
validation results

@martin_gorner
RNN shapes
0 H5
S t _ J o h
t _ J o h n
character-
based
Characters,
one-hot encoded

@martin_gorner
RNN shapes
0
The USA and China have agreed
geopolitics
Words encoded as
vectors: “embeddings”
Text
classification
embeddings = tf.Variable(tf.random_uniform([vocab_size, embed_size]))
X = tf.nn.embedding_lookup(embeddings, train_inputs)
Tensorflow sample: goo.gl/m41mNp
Or constant => see Word2Vec

@martin_gorner
Bitchin’ batchin’
China and the USA have agreed to a new round of talks 12
The quick brown fox jumps over the lazy dog . 10
Boys will be boys . 5
Tom , get your coat . We are going out . 11
Math rules the world . Men rule math . 9
0
Hr, H =
tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen)
Hn
∅ ∅ ∅ ∅ ∅ ∅ ∅
∅ ∅
∅
∅ ∅ ∅
geopolitics
seq
len

@martin_gorner
RNN shapes
The red cat ate the mouse
Words encoded
as vectors Text
translation
0
Le chat rouge a mangé la souris
∅
∅
Le chat rouge a mangé la souris
Tensorflow sample: goo.gl/KyKLDv
tf.nn.sampled_softmax_loss(…)
slow
fast

@martin_gorner
RNN shapes
Images encoded
as vectors
Image captioning
(simplified)
A man on a beach flying a
∅
kite
A man on a beach flying a
0
∅
kite
Google’s neural net for image captioning: goo.gl/VgZUQZ
For ex. output
of convolutional network or auto-encoder

@martin_gorner
Image captioning
A person riding a motorcycle on
a dirt road.
A herd of elephants walking
across a dry grass field.

@martin_gorner
Image captioning
A refrigerator filled with lots of
food and drinks.
A yellow school bus parked in a
parking lot.

@martin_gorner
Cloud Machine Learning Engine

@martin_gorner
Data-parallel distributed training
parameter servers
model
replicas
data
W’ = W + ∆W
asynchronous
updates
I ♡ noise

@martin_gorner
TF high level API
from tensorflow.contrib import learn
def model_fn(X, Y_, mode):
Yn = … # model layers
predictions = {"probabilities": …, "digits": …} #free-form
evaluations = {'accuracy': metrics.accuracy(…)} #free-form
loss = …
train = layers.optimize_loss(loss, …)
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets
Samples: goo.gl/F3i3bf, goo.gl/CofxFM

@martin_gorner
Estimator, Experiment, learn_runner
from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils
def experiment_fn(job_dir):
return learn.Experiment(
estimator=learn.Estimator(model_fn, model_dir=job_dir,
config=learn.RunConfig(save_checkpoints_secs=None,
save_checkpoints_steps=1000)),
train_input_fn=…, # data feed
eval_input_fn=…, # data feed
train_steps=10000,
eval_steps=1,
export_strategies=make_export_strategy(export_input_fn=
serving_input_fn))
def main(argv=None):
job_dir = # parse argument --job-dir
learn_runner.run(experiment_fn, job_dir)
if __name__ == '__main__': main()
Free stuff !!!
Tensorboard graphs
Resume on fail
Parallel data feeds
Serving model export
Distributed training
trainingInput:
scaleTier: STANDARD_1

@martin_gorner
Data queues for distributed training
# dummy implementation for data that fits in memory
def train_data_input_fn(mnist):
images = tf.constant(mnist.train.images)
labels = tf.constant(mnist.train.labels)
return tf.train.shuffle_batch([images, labels], 100,
1100, 1000, enqueue_many=True)
# dummy implementation for data that fits in memory
def eval_data_input_fn(mnist):
return tf.constant(mnist.test.images),
tf.constant(mnist.test.labels)
Inserts queue nodes
Into TF graph
For practical data
queuing use the
TF Records format
batch size
trainingInput:

@martin_gorner
Serving input function
# Online predictions on Cloud ML Engine
def serving_input_fn():
# Placeholder for data deserialised from JSON
inputs = {'A': tf.placeholder(tf.uint8, [None, 28, 28])}
# Transform the data as needed
features = [tf.cast(inputs['A'], tf.float32)]
return input_fn_utils.InputFnOps(features, None, inputs)
trainingInput:
Batch of images
For MNIST

@martin_gorner
Run it
gcloud ml-engine jobs submit training job22
--job-dir=gs://mybucket/job22
--package-path=trainer
--module-name=trainer.task
--config=config.yaml
--
--<custom model arguments here>
Deploy trained model to prod = click click click
gcloud ml-engine predict
--model <model_name>
--json-instances mydigits.json
model checkpoints
tensorboard
summaries
trainingInput:
autoscaled
serving

@martin_gorner
Demo: aucnet
Retrain Inception yourself: goo.gl/Z9eNek

@martin_gorner
Cloud ML Engine
your TensorFlow models
trained in Google’s cloud.
Pre-trained models:
That’s all
folks...
Martin Görner
Google Developer relations
@martin_gorner
Cloud Vision API
Cloud Speech API
Google Translate API
Natural Language API
Video Intelligence API
Cloud Jobs API PRIVATE BETA
Cloud Auto ML VisionALPHA
Just bring your data
Cloud TPU BETA
ML supercomputing
Videos, slides, code:
github.com/
GoogleCloudPlatform/
tensorflow-without-a-phd
Have fun !

1
neurons
Tensorflow and
deep learning
without a PhD

TensorFlow RNN Language Model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to TensorFlow RNN Language Model

Similar to TensorFlow RNN Language Model (20)

Recently uploaded

Recently uploaded (20)

TensorFlow RNN Language Model