SlideShare a Scribd company logo
1 of 71
#Tensorflow @martin_gorner
deep
Science !
deep
Code ...
>TensorFlow, deep learning and 
recurrent neural networks
without a PhD_
>TensorFlow, deep learning and 
recurrent neural networks
without a PhD_
The superpower: batch normalisation
@martin_gorner
Data “whitening”
Data: large values, different scales, skewed, correlated
@martin_gorner
Data “whitening”
Modified data: centered around zero, rescaled...
Subtract average
Divide by std dev
@martin_gorner
Data “whitening”
(A+B)/2
A-B
Modified data: … and decorrelated (that was almost a Principal Component Analysis)
@martin_gorner
Data “whitening”
new
A
new
B
= A B
x
0.05 0.12
0.61 -1.23
+ -1.45 0.12
W ? B ?
A network layer
can do this !
Scale & rotate shift
@martin_gorner
Fully connected network
9
...
0 1 2
softmax
200
100
60
10
30
784
OK
OK ?
OK ???
OK ???
OK ???
@martin_gorner
Without batch normalisation
sigmoid
My distribution
of inputs
boo-hoo
@martin_gorner
Batch normalisation
Center and re-scale logits
before the activation function
(decorrelate ? no, too complex)
Compute average and
variance on mini-batch
Add learnable scale and offset
for each logit so as to restore expressiveness
“logit” = weighted sum + bias
one of each
per neuron
Try α=stdev(x) and β=avg(x) and you have BN(x) = x
@martin_gorner
Batch normalisation
depends from:
weights, biases, images
depends from:
same weights and biases, images
only one set of weights and biases in a mini-batch
=> BN is differentiable relatively to weights, biases, α and β
It can be used as a layer in the network, gradient calculations will still work
Batch-norm α, β
x =
weighted
sum + bias
activation
fn
@martin_gorner
With batch normalisation (sigmoid)
sigmoid
distribution of
neuron output
Batch norm
@martin_gorner
With batch normalisation (RELU)
RELU
My distribution
of inputs
@martin_gorner
Batch normalisation done right
Batch-norm α, β
x =
weighted
sum + b
activation
fn
biases :
no longer useful
when activation fn is RELU
α is not useful
It does not modify output distrib.
Per
neuron:
relu sigmoid
without
BN
bias bias
With
BN
β α, β
+You can go faster: use higher learning rate
+BN also regularises: lower or remove dropout
@martin_gorner
Convolutional batch normalisation
W1[4, 4, 3]
W2[4, 4, 3]
Each neuron or patch has a value:
● per image in the batch
● per x position
● per y position
=> compute avg and stdev across all
batchsize x width x height values
b1 α1 β1
b2 α2 β2
Still, one bias,
scale or offset
per neuron
@martin_gorner
Batch normalisation at test time
Stats on what ?
● Last batch: no
● all images: yes (but not practical)
● => Exponential moving average during training
@martin_gorner
Batch normalisation with Tensorflow
def batchnorm_layer(Ylogits, is_test, Offset, Scale, iteration, convolutional=False):
exp_moving_avg = tf.train.ExponentialMovingAverage(0.9999, iteration)
if convolutional: # avg across batch, width, height
mean, variance = tf.nn.moments(Ylogits, [0, 1, 2])
else:
mean, variance = tf.nn.moments(Ylogits, [0])
update_moving_averages = exp_moving_avg.apply([mean, variance])
m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean)
v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance)
Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, variance_epsilon=1e-5)
return Ybn, update_moving_averages
Define one offset and/or
scale per neuron
apply activation fn on Ybn
don’t forget to execute this (sess.run)
The code is on GitHub: goo.gl/DEOe7Z
Demo
@martin_gorner
99.5%
@martin_gorner
More superpowers
high level API
@martin_gorner
Layers
from tensorflow.contrib import layers
# this
Y = layers.relu(X, 200)
# instead of this
W = tf.Variable(tf.zeros([784, 200]))
b = tf.Variable(tf.zeros([200]))
Y = tf.nn.relu(tf.matmul(X,W) + b)
Sample: goo.gl/y1SSFy
@martin_gorner
Model function
from tensorflow.contrib import learn, layers, metrics
def model_fn(X, Y_, mode):
Yn = … # model layers
prob = tf.nn.softmax(Yn)
digi = tf.argmax(prob, 1)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(…)
train = layers.optimize_loss(loss,framework.get_global_step(), 0.003,"Adam")
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets“
learning rate
TRAIN, EVAL
or INFER
Sample: goo.gl/y1SSFy
@martin_gorner
Estimator
estimator = learn.Estimator(model_fn=model_fn)
estimator.fit(input_fn=… , steps=10000)
estimator.evaluate(input_fn=…, steps=1)
# => {'accuracy': … }
estimator.predict(input_fn=…)
# => {"probabilities":…, "digits":…}
# input_fn: feeds in batches of features and targets
Sample: goo.gl/y1SSFy
@martin_gorner
Convolutional network
def conv_model(X, Y_, mode):
XX = tf.reshape(X, [-1, 28, 28, 1])
Y1 = layers.conv2d(XX, num_outputs=6, kernel_size=[6, 6])
Y2 = layers.conv2d(Y1, num_outputs=12, kernel_size=[5, 5], stride=2)
Y3 = layers.conv2d(Y2, num_outputs=24, kernel_size=[4, 4], stride=2)
Y4 = layers.flatten(Y3)
Y5 = layers.relu(Y4, 200)
Ylogits = layers.linear(Y5, 10)
prob = tf.nn.softmax(Ylogits)
digi = tf.cast(tf.argmax(prob, 1), tf.uint8)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, tf.one_hot(Y_, 10))
train = layers.optimize_loss(loss, framework.get_global_step(), 0.003, "Adam")
return learn.ModelFnOps(mode, predictions, loss, train, evaluations)
estimator = learn.Estimator(model_fn=conv_model)
Sample: goo.gl/y1SSFy
Recurrent Neural Networks
@martin_gorner
RNN
softmax
tanh
X: inputs
Y: outputs
H: internal
state
RNN cell
H
Xt
Yt
N: internal size
@martin_gorner
RNN
X = Xt | Ht-1
Ht = tanh(X.WH + bH)
Yt = softmax(Ht.W + b)
concatenation
RNN cell
H
Xt
Yt
@martin_gorner
RNN training
H-1
cell
H0
Y0
X0
cell
H1
Y1
X1
cell
H2
Y2
X2
cell
H3
Y3
X3
cell
H4
Y4
X4
cell
H5
Y5
X5
The same weights and biases shared across iterations
@martin_gorner
Deep RNN
0
0
cell
H’0
Y0
cell
H0
X0
cell
H’1
Y1
cell
H1
X1
cell
H’2
Y2
cell
H2
X2
cell
H’3
Y3
cell
H3
X3
cell
H’4
Y4
cell
H4
X4
cell
H’5
Y5
cell
H5
X5
L: number of layers
@martin_gorner
Michel C. was born in Paris, France. He is married and has three children. He received a M.S.
in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987,
and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He
specialized in child and adolescent psychiatry and his first field of research was severe mood disorders
in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ?
Long term dependencies: a problem
Short context
English,
German,
Russian,
French …
Long context Problems…
Hn
…
Michel C. was born in
French
…
Hn-1
@martin_gorner
LSTM
LSTM = Long Short Term Memory
tanh
tanh
σ
Xt
Ht-1 Ht
Yt
Ct
Ct-1
concatenation
Element-wise operations
tanh
tanh Neural net. layers
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
Yt = softmax(Ht.W + b)
× +
× ×
×
σ σ
σ
@martin_gorner
LSTM
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
Yt = softmax(Ht.W + b)
tanh
tanh
σ
Xt
Ht-1 Ht
Yt
Ct-1
× +
× ×
σ σ
Ct
concatenate :
forget gate :
update gate :
result gate :
input :
new C :
new H :
output :
p+n
n
n
n
n
n
n
vector sizes
m
Gru !
@martin_gorner
GRU
X = Xt | Ht-1
z = σ(X.Wz + bz)
r = σ(X.Wr + br)
X’ = Xt | r * Ht-1
X” = tanh(X’.Wc + bc)
Ht = (1-z) * Ht-1 + z * X”
Yt = softmax(Ht.W + b)
p+n
n
n
p+n
n
n
vector sizes
m
GRU = Gated
Recurrent Unit
GRU Ht
Yt
Xt
Ht-1
2 gates instead
of 3 => cheaper
Ht
@martin_gorner
Language model in Tensorflow
0 H5
S t _ J o h
t _ J o h n
character-
based
Characters,
one-hot encoded
@martin_gorner
Language model in Tensorflow
0
GRU H0
X0
H0
cells = [tf.nn.rnn_cell.GRUCell(CELLSIZE) for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False)
Hr, H = tf.nn.dynamic_rnn(mcell, X, initial_state=Hin)
GRU
0
H’0
H’0
GRU
0
H”0
H”0
GRU H1
X1
H0
GRU H’1
H’0
GRU H”1
H”1
GRU H2
X2
H0
GRU H’2
H’0
GRU H”2
H”2
GRU H3
X3
H0
GRU H’3
H’0
GRU H”3
H”3
GRU H5
X4
H0
GRU H’5
H’0
GRU H”5
H”5
GRU H6
X6
H0
GRU H’6
H’0
GRU H”6
H”6
GRU H7
X7
H0
GRU H’7
H’0
GRU H”7
H”7
GRU H8
X8
H0
GRU H’8
H’0
GRU H”8
H”8
H
Hin
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
defines weights and
biases internally
@martin_gorner
Softmax readout layer
# Hr
Hf = tf.reshape(Hr, [-1, CELLSIZE])
0
H0
X0
H0
0
H’0
H’0
0
H”0
H”0
H1
X1
H0 H’1
H’0 H”1
H”1
H2
X2
H0 H’2
H’0 H”2
H”2
H3
X3
H0 H’3
H’0 H”3
H”3
H5
X4
H0 H’5
H’0 H”5
H”5
H6
X6
H0 H’6
H’0 H”6
H”6
H7
X7
H0 H’7
H’0 H”7
H”7
H8
X8
H0 H’8
H’0 H”8
H”8
Tip: handle sequence
and batch elements
the same
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
[ BATCHSIZE, SEQLEN, CELLSIZE ]
[ BATCHSIZE x SEQLEN, CELLSIZE ]
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
Ylogits = tf.layers.dense(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
[ BATCHSIZE x SEQLEN, ALPHASIZE ]
[ BATCHSIZE x SEQLEN, ALPHASIZE ]
@martin_gorner
Inputs and outputs
0
H0
X0
H0
0
H’0
H’0
0
H”0
H1
X1
H0 H’1
H’0 H”1
H2
X2
H0 H’2
H’0 H”2
H3
X3
H0 H’3
H’0 H”3
H5
X4
H0 H’5
H’0 H”5
H6
X6
H0 H’6
H’0 H”6
H7
X7
H0 H’7
H’0 H”7
H8
X8
H0 H’8
H’0 H”8
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
S t _ A n d
t _ A n d
r e
r e w
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
H: [ BATCHSIZE,
CELLSIZE x NLAYERS ]
@martin_gorner
Placeholders, and the rest...
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None, CELLSIZE*NLAYERS])
# Y, loss, Hout = my_model(X, Y_, Hin)
predictions = tf.argmax(Y, 1)
predictions = tf.reshape(predictions, [batchsize, -1])
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
[ BATCHSIZE, CELLSIZE x NLAYERS ]
Y: [ BATCHSIZE x SEQLEN, ALPHASIZE ]
[ BATCHSIZE x SEQLEN ]
[ BATCHSIZE, SEQLEN ]
@martin_gorner
Bitchin’ batchin’
Ht
Ht-1
The quic
seventh
Mr. Herm
Batch 1
k brown
heaven o
ann Zapf
Ht+1
Batch 2
fox jump
f typogr
was the
Ht+
2
Batch 3
++
later
++++
later
start
for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN,
nb_epochs=10):
@martin_gorner
Language model in Tensorflow
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(Xd, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Yd_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None,
CELLSIZE*NLAYERS])
# the model
cell = [tf.nn.rnn_cell.GRUCell(CELLSIZE)
for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.
MultiRNNCell([cell]*NLAYERS,state_is_tuple=False)
Hr,H = tf.nn.
dynamic_rnn(mcell, X,
initial_state=Hin)
# softmax output layer
Hf = tf.reshape(Hr, [-1, CELLSIZE])
Ylogits = layers.linear(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
Yp = tf.argmax(Y, 1)
Yp = tf.reshape(Yp, [batchsize, -1])
# loss and training step (optimizer)
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
# training loop
for epoch in range(20):
inH = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])
for x, y_ in utils.rnn_minibatch_sequencer(codetext,
BATCHSIZE, SEQLEN, nb_epochs=30):
dic = {X: x, Y_: y_, Hin:inH}
_,y,outH = sess.run([train_step,Yp,H,], feed_dict=dic)
inH = outH
The code is on GitHub:
github.com/martin-gorner/
tensorflow-rnn-shakespeare
@martin_gorner
ee o no nonnaoter s ee seih iae r t i r io i ro s
sierota tsohoreroneo rsa esia anehereeo hensh
rho etnrhhs iti saoitns t et rsearh tshseoeh ta
oirhroren e eaetetnesnareeeoaraihss nshtano eter
e oooaoaeee nonn is heh easren ieson httn nihensont
t e n a ooe oerhi neaeehteriseat tiet i i ntsh
orhi e ohhsiea e aht ohr er ra eeo oeeitrot
hethisesaaei o saeii straieiteoeresorh e ooeri
e ninesh sort a es h rs hattnteseato sonoanr sniaase
s rshninsasi na sntennn oti r etnsnrse oh n
r e tiathhnaeeano trrr hhohooon rrt eernre e rnoh
Shakespeare
0.03
epochs
C1
@martin_gorner
Shakespeare
II WERENI
Are I I wos the wheer boaer.
Tin thim mh cals sate bauut site tar oue tinl
an bsisonetoal yer an fimireeren.
L[IO SI Hns oret bsllssts aaau ton hete me toer
frurtor sheus aed trat
A faler bis tote oadt tou than male, tel mou ce
an cime. ais fauto ws cien whus yas. Ande fert te a
ut wond aal sinr be at saar
0.1
epochs
C3
@martin_gorner
BERENS Hall hat in she the hir meres.
Perstr in ame not of heard, me thin hild of shear and
ant on of mare. I lore wes lour.
DOCHES The chaster'd on not fenst
The laldoos more.
[Ixeln thrish]
And tho priines sith of hamdeling the san wind
Shakespeare
0.2
epochs
C5
Stage directions ?
@martin_gorner
KING LEAR Alas, I am not forsworn both to bod!
And let the firm I have to'st trainoured.
KING HENRY VIII I love not my father.
PORDIA He tash you will have it.
HENRY BLUTIUS Work, thou lovest my son here,
thy father's fath!
CLIOND Why, then, would say, the beasts are
Shakespeare
1
epoch
C6
Invented
names !
@martin_gorner
Shakespeare
30
epochs
TITUS ANDRONICUS
ACT I
SCENE III An ante-chamber. The COUNT's palace.
[Enter CLEOMENES, with the Lord SAY]
Chamberlain Let me see your worshing in my hands.
LUCETTA I am a sign of me, and sorrow sounds it.
B10
@martin_gorner
Shakespeare
30
epochs
And sorrow far into the stars of men,
Without a second tears to seek the best and
bed,
With a strange service, and the foul prince of
Rome
[Exeunt MARK ANTONY and LEPIDUS]
Well said, my lord,--
MENENIUS I do not say so.
Well, I will not have no better ways;
B10
@martin_gorner
diassts_= =tlns==eti.s=tessn_((
sie_s_nts_ens= dondtnenroe dnar taonte
srst anttntoilonttiteaen
detrtstinsenoaolsesnesoairt(
arssserleeeerltrdlesssoeeslslrlslie(e
drnnaleeretteaelreesioe niennoarens
dssnstssaorns sreeoeslrteasntotnnai(ar
dsopelntederlalesdanserl
lts(sitae(e)
Python code
0.03
epochs
A1
@martin_gorner
with
self.essors_sigeater(output_dits_allss,
self._train.
for sampated to than ubtexsormations.
expeddions = np.randim(natched_collection,
ranger, mang_ops, samplering)
def assestErrorume_gens(assignex) as
and(sampled_veases):
eved.
Python code
0.1
epochs
A2
Python
keywords
@martin_gorner
def testGiddenSelfBeShareMecress(self):
with self.test_session() as sess:
tat = tf.contrib.matrix.cast_column_variable([1, 1], [0, 1, 1], [1, 7]],
[[1, 1, 1]].file(file, line_state_will_file))
with self.test_session():
self.assertAllEqual(1, l.ex6)
self.assertEqual(output_graph_def is_output_tensors_op(
tf.pro_context_name.sqrt(sess)
def test_shape(self):
res = values=value_rns[0].eval())
def tempDimpleSeriesGredicsIothasedWouthAverageData(self):
self._testDirector(self):
self._test_inv3_size = 5
with tf.train.ConvolutioBailLors_startswith("save_dir_context.PutIsprint().eval())
return tf.contrib.learn.RUCISLCCS:
# Check the orfloating so that the nimesting object mumputable othersifier.
# dense_keys.tokens_prefix/statch_size of the input1 tensors.
@property
Python code
0.4
epochs
A3
Wrong
([])
nesting
Correct
use of
colons:
Hallucinated
function
names
@martin_gorner
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in [0.1, 2.0, 3.0]]
def __init__(self, expected):
return np.array([[0, 0, 0], [0, 0, 0]])
self.assertAllEqual(tf.placeholder(tf.float32, shape=(3, 3)),(shape, prior.pack(),
tf.float32))
for keys in tensor_list:
return np.array([[0, 0, 0]]).astype(np.float32)
# Check that we have both scalar tensor for being invalid to a vector of 1 indicating
# the total loss of the same shape as the shape of the tensor.
sharded_weights = [[0.0, 1.0]]
# Create the string op to apply gradient terms that also batch.
# The original any operation as a code when we should alw infer to the session case.
Python code
12
epochs
B10
Correct triple ([]) nesting
Recites
Apache
license
Tensorflow
tips!
@martin_gorner
...and more
Credit to Andrej
Karpathy’s blog:
The Unreasonable
Effectiveness of
Recurrent Neural
Networks
@martin_gorner
Tensorflow: save, restore
saver = tf.train.Saver(keep_checkpoint_every_n_hours=0.1, max_to_keep=5)
with tf.Session() as sess:
# ... training loop ...
saver.save(sess, 'file_' , global_step=iter)
=> Save variables in , the graph in
file_200 file_200.meta
with tf.Session() as sess:
resto = tf.train.import_meta_graph('file_200.meta')
resto.restore(sess, 'file_200')
=> Restore graph and variable values
Must name variables explicitly !!!
# when saving
X = tf.placeholder(tf.uint8, name='X')
Y = tf.nn.softmax(Ylogits, name='Y')
# when using restored graph
y,h = sess.run(['Y:0', 'H:0'],
feed_dict={'X:0': y} )
@martin_gorner
Shakespeare generation
with tf.Session() as sess:
resto = tf.train.import_meta_graph('shake_200.meta')
resto.restore(sess, 'shake_200')
# initial values
x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32)
for i in range(100000):
dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1}
y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic)
c = my_txtutils.sample_from_probabilities(y, topn=5)
x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
print(chr(my_txtutils.convert_to_ascii(c)), end="")
X
Ht
H’0
Y
Ht-1
One char
at a time
@martin_gorner
Tensorboard
summary_writer = tf.train.SummaryWriter("log/train_" + time)
loss_summary = tf.scalar_summary("batch_loss", loss)
# in training loop:
smm = sess.run(summaries, feed_dict=dic)
summary_writer.add_summary(smm, iteration)
Tip: use time in
logdir name
Tip: use a second
SummaryWriter for
validation results
@martin_gorner
RNN shapes
0 H5
S t _ J o h
t _ J o h n
character-
based
Characters,
one-hot encoded
@martin_gorner
RNN shapes
0
The USA and China have agreed
geopolitics
Words encoded as
vectors: “embeddings”
Text
classification
embeddings = tf.Variable(tf.random_uniform([vocab_size, embed_size]))
X = tf.nn.embedding_lookup(embeddings, train_inputs)
Tensorflow sample: goo.gl/m41mNp
Or constant => see Word2Vec
@martin_gorner
Bitchin’ batchin’
China and the USA have agreed to a new round of talks 12
The quick brown fox jumps over the lazy dog . 10
Boys will be boys . 5
Tom , get your coat . We are going out . 11
Math rules the world . Men rule math . 9
0
Hr, H =
tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen)
Hn
∅ ∅ ∅ ∅ ∅ ∅ ∅
∅ ∅
∅
∅ ∅ ∅
geopolitics
seq
len
@martin_gorner
RNN shapes
The red cat ate the mouse
Words encoded
as vectors Text
translation
0
Le chat rouge a mangé la souris
∅
∅
Le chat rouge a mangé la souris
Tensorflow sample: goo.gl/KyKLDv
tf.nn.sampled_softmax_loss(…)
slow
fast
@martin_gorner
RNN shapes
Images encoded
as vectors
Image captioning
(simplified)
A man on a beach flying a
∅
kite
A man on a beach flying a
0
∅
kite
Google’s neural net for image captioning: goo.gl/VgZUQZ
For ex. output
of convolutional network or auto-encoder
@martin_gorner
Image captioning
Google’s neural net for image captioning: goo.gl/VgZUQZ
A person riding a motorcycle on
a dirt road.
A herd of elephants walking
across a dry grass field.
@martin_gorner
Image captioning
Google’s neural net for image captioning: goo.gl/VgZUQZ
A refrigerator filled with lots of
food and drinks.
A yellow school bus parked in a
parking lot.
@martin_gorner
Cloud Machine Learning Engine
@martin_gorner
Data-parallel distributed training
parameter servers
model
replicas
data
W’ = W + ∆W
asynchronous
updates
I ♡ noise
@martin_gorner
TF high level API
from tensorflow.contrib import learn
def model_fn(X, Y_, mode):
Yn = … # model layers
predictions = {"probabilities": …, "digits": …} #free-form
evaluations = {'accuracy': metrics.accuracy(…)} #free-form
loss = …
train = layers.optimize_loss(loss, …)
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
Estimator, Experiment, learn_runner
from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils
def experiment_fn(job_dir):
return learn.Experiment(
estimator=learn.Estimator(model_fn, model_dir=job_dir,
config=learn.RunConfig(save_checkpoints_secs=None,
save_checkpoints_steps=1000)),
train_input_fn=…, # data feed
eval_input_fn=…, # data feed
train_steps=10000,
eval_steps=1,
export_strategies=make_export_strategy(export_input_fn=
serving_input_fn))
def main(argv=None):
job_dir = # parse argument --job-dir
learn_runner.run(experiment_fn, job_dir)
if __name__ == '__main__': main()
Free stuff !!!
Tensorboard graphs
Resume on fail
Parallel data feeds
Serving model export
Distributed training
trainingInput:
scaleTier: STANDARD_1
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
Data queues for distributed training
# dummy implementation for data that fits in memory
def train_data_input_fn(mnist):
images = tf.constant(mnist.train.images)
labels = tf.constant(mnist.train.labels)
return tf.train.shuffle_batch([images, labels], 100,
1100, 1000, enqueue_many=True)
# dummy implementation for data that fits in memory
def eval_data_input_fn(mnist):
return tf.constant(mnist.test.images),
tf.constant(mnist.test.labels)
Inserts queue nodes
Into TF graph
For practical data
queuing use the
TF Records format
batch size
trainingInput:
scaleTier: STANDARD_1
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
Serving input function
# Online predictions on Cloud ML Engine
def serving_input_fn():
# Placeholder for data deserialised from JSON
inputs = {'A': tf.placeholder(tf.uint8, [None, 28, 28])}
# Transform the data as needed
features = [tf.cast(inputs['A'], tf.float32)]
return input_fn_utils.InputFnOps(features, None, inputs)
trainingInput:
scaleTier: STANDARD_1
Batch of images
For MNIST
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
Run it
gcloud ml-engine jobs submit training job22
--job-dir=gs://mybucket/job22
--package-path=trainer
--module-name=trainer.task
--config=config.yaml
--
--<custom model arguments here>
Deploy trained model to prod = click click click
gcloud ml-engine predict
--model <model_name>
--json-instances mydigits.json
model checkpoints
tensorboard
summaries
trainingInput:
scaleTier: STANDARD_1
autoscaled
serving
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
Demo: aucnet
Retrain Inception yourself: goo.gl/Z9eNek
@martin_gorner
Cloud ML Engine
your TensorFlow models
trained in Google’s cloud.
Pre-trained models:
That’s all
folks...
Martin Görner
Google Developer relations
@martin_gorner
Cloud Vision API
Cloud Speech API
Google Translate API
Natural Language API
Video Intelligence API
Cloud Jobs API PRIVATE BETA
Cloud Auto ML VisionALPHA
Just bring your data
Cloud TPU BETA
ML supercomputing
Videos, slides, code:
github.com/
GoogleCloudPlatform/
tensorflow-without-a-phd
Have fun !
1
neurons
Tensorflow and
deep learning
without a PhD

More Related Content

What's hot

Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataShadhin Rahman
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmit Ghosh
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsAlexander Litvinenko
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability DistributionsE-tan
 
Paper finance hosseinkhan_remy
Paper finance hosseinkhan_remyPaper finance hosseinkhan_remy
Paper finance hosseinkhan_remyRémy Hosseinkhan
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural NetworkMasahiro Suzuki
 
Introduction to Tensorflow
Introduction to TensorflowIntroduction to Tensorflow
Introduction to TensorflowTzar Umang
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 

What's hot (20)

Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
lec18_ref.pdf
lec18_ref.pdflec18_ref.pdf
lec18_ref.pdf
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability Distributions
 
Paper finance hosseinkhan_remy
Paper finance hosseinkhan_remyPaper finance hosseinkhan_remy
Paper finance hosseinkhan_remy
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
Introduction to Tensorflow
Introduction to TensorflowIntroduction to Tensorflow
Introduction to Tensorflow
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 

Similar to TensorFlow RNN Language Model

SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)Eric Zhang
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITiansAshish Bansal
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection철 김
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
Introduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and TensorflowIntroduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and TensorflowOswald Campesato
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowSri Ambati
 
International Journal of Engineering Research and Development (IJERD)
 International Journal of Engineering Research and Development (IJERD) International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.pptManiMaran230751
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
Reasoning about laziness
Reasoning about lazinessReasoning about laziness
Reasoning about lazinessJohan Tibell
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdfAhmadM65
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
2 linear regression with one variable
2 linear regression with one variable2 linear regression with one variable
2 linear regression with one variableTanmayVijay1
 

Similar to TensorFlow RNN Language Model (20)

SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Introduction
IntroductionIntroduction
Introduction
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITians
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Introduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and TensorflowIntroduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and Tensorflow
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
International Journal of Engineering Research and Development (IJERD)
 International Journal of Engineering Research and Development (IJERD) International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
H2 o berkeleydltf
H2 o berkeleydltfH2 o berkeleydltf
H2 o berkeleydltf
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Objective Bayesian Ana...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Objective Bayesian Ana...MUMS: Bayesian, Fiducial, and Frequentist Conference - Objective Bayesian Ana...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Objective Bayesian Ana...
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Reasoning about laziness
Reasoning about lazinessReasoning about laziness
Reasoning about laziness
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
Linear regression
Linear regressionLinear regression
Linear regression
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdf
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
2 linear regression with one variable
2 linear regression with one variable2 linear regression with one variable
2 linear regression with one variable
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

TensorFlow RNN Language Model

  • 1. #Tensorflow @martin_gorner deep Science ! deep Code ... >TensorFlow, deep learning and recurrent neural networks without a PhD_ >TensorFlow, deep learning and recurrent neural networks without a PhD_
  • 2. The superpower: batch normalisation
  • 3. @martin_gorner Data “whitening” Data: large values, different scales, skewed, correlated
  • 4. @martin_gorner Data “whitening” Modified data: centered around zero, rescaled... Subtract average Divide by std dev
  • 5. @martin_gorner Data “whitening” (A+B)/2 A-B Modified data: … and decorrelated (that was almost a Principal Component Analysis)
  • 6. @martin_gorner Data “whitening” new A new B = A B x 0.05 0.12 0.61 -1.23 + -1.45 0.12 W ? B ? A network layer can do this ! Scale & rotate shift
  • 7. @martin_gorner Fully connected network 9 ... 0 1 2 softmax 200 100 60 10 30 784 OK OK ? OK ??? OK ??? OK ???
  • 9. @martin_gorner Batch normalisation Center and re-scale logits before the activation function (decorrelate ? no, too complex) Compute average and variance on mini-batch Add learnable scale and offset for each logit so as to restore expressiveness “logit” = weighted sum + bias one of each per neuron Try α=stdev(x) and β=avg(x) and you have BN(x) = x
  • 10. @martin_gorner Batch normalisation depends from: weights, biases, images depends from: same weights and biases, images only one set of weights and biases in a mini-batch => BN is differentiable relatively to weights, biases, α and β It can be used as a layer in the network, gradient calculations will still work Batch-norm α, β x = weighted sum + bias activation fn
  • 11. @martin_gorner With batch normalisation (sigmoid) sigmoid distribution of neuron output Batch norm
  • 12. @martin_gorner With batch normalisation (RELU) RELU My distribution of inputs
  • 13. @martin_gorner Batch normalisation done right Batch-norm α, β x = weighted sum + b activation fn biases : no longer useful when activation fn is RELU α is not useful It does not modify output distrib. Per neuron: relu sigmoid without BN bias bias With BN β α, β +You can go faster: use higher learning rate +BN also regularises: lower or remove dropout
  • 14. @martin_gorner Convolutional batch normalisation W1[4, 4, 3] W2[4, 4, 3] Each neuron or patch has a value: ● per image in the batch ● per x position ● per y position => compute avg and stdev across all batchsize x width x height values b1 α1 β1 b2 α2 β2 Still, one bias, scale or offset per neuron
  • 15. @martin_gorner Batch normalisation at test time Stats on what ? ● Last batch: no ● all images: yes (but not practical) ● => Exponential moving average during training
  • 16. @martin_gorner Batch normalisation with Tensorflow def batchnorm_layer(Ylogits, is_test, Offset, Scale, iteration, convolutional=False): exp_moving_avg = tf.train.ExponentialMovingAverage(0.9999, iteration) if convolutional: # avg across batch, width, height mean, variance = tf.nn.moments(Ylogits, [0, 1, 2]) else: mean, variance = tf.nn.moments(Ylogits, [0]) update_moving_averages = exp_moving_avg.apply([mean, variance]) m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean) v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance) Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, variance_epsilon=1e-5) return Ybn, update_moving_averages Define one offset and/or scale per neuron apply activation fn on Ybn don’t forget to execute this (sess.run) The code is on GitHub: goo.gl/DEOe7Z
  • 17. Demo
  • 20. @martin_gorner Layers from tensorflow.contrib import layers # this Y = layers.relu(X, 200) # instead of this W = tf.Variable(tf.zeros([784, 200])) b = tf.Variable(tf.zeros([200])) Y = tf.nn.relu(tf.matmul(X,W) + b) Sample: goo.gl/y1SSFy
  • 21. @martin_gorner Model function from tensorflow.contrib import learn, layers, metrics def model_fn(X, Y_, mode): Yn = … # model layers prob = tf.nn.softmax(Yn) digi = tf.argmax(prob, 1) predictions = {"probabilities": prob, "digits": digi} #free-form evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form loss = tf.nn.softmax_cross_entropy_with_logits(…) train = layers.optimize_loss(loss,framework.get_global_step(), 0.003,"Adam") return learn.ModelFnOps(mode, predictions,loss,train,evaluations) “features” and “targets“ learning rate TRAIN, EVAL or INFER Sample: goo.gl/y1SSFy
  • 22. @martin_gorner Estimator estimator = learn.Estimator(model_fn=model_fn) estimator.fit(input_fn=… , steps=10000) estimator.evaluate(input_fn=…, steps=1) # => {'accuracy': … } estimator.predict(input_fn=…) # => {"probabilities":…, "digits":…} # input_fn: feeds in batches of features and targets Sample: goo.gl/y1SSFy
  • 23. @martin_gorner Convolutional network def conv_model(X, Y_, mode): XX = tf.reshape(X, [-1, 28, 28, 1]) Y1 = layers.conv2d(XX, num_outputs=6, kernel_size=[6, 6]) Y2 = layers.conv2d(Y1, num_outputs=12, kernel_size=[5, 5], stride=2) Y3 = layers.conv2d(Y2, num_outputs=24, kernel_size=[4, 4], stride=2) Y4 = layers.flatten(Y3) Y5 = layers.relu(Y4, 200) Ylogits = layers.linear(Y5, 10) prob = tf.nn.softmax(Ylogits) digi = tf.cast(tf.argmax(prob, 1), tf.uint8) predictions = {"probabilities": prob, "digits": digi} #free-form evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, tf.one_hot(Y_, 10)) train = layers.optimize_loss(loss, framework.get_global_step(), 0.003, "Adam") return learn.ModelFnOps(mode, predictions, loss, train, evaluations) estimator = learn.Estimator(model_fn=conv_model) Sample: goo.gl/y1SSFy
  • 25. @martin_gorner RNN softmax tanh X: inputs Y: outputs H: internal state RNN cell H Xt Yt N: internal size
  • 26. @martin_gorner RNN X = Xt | Ht-1 Ht = tanh(X.WH + bH) Yt = softmax(Ht.W + b) concatenation RNN cell H Xt Yt
  • 29. @martin_gorner Michel C. was born in Paris, France. He is married and has three children. He received a M.S. in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987, and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He specialized in child and adolescent psychiatry and his first field of research was severe mood disorders in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ? Long term dependencies: a problem Short context English, German, Russian, French … Long context Problems… Hn … Michel C. was born in French … Hn-1
  • 30. @martin_gorner LSTM LSTM = Long Short Term Memory tanh tanh σ Xt Ht-1 Ht Yt Ct Ct-1 concatenation Element-wise operations tanh tanh Neural net. layers X = Xt | Ht-1 f = σ(X.Wf + bf) u = σ(X.Wu + bu) r = σ(X.Wr + br) X’ = tanh(X.Wc + bc) Ct = f * Ct-1 + u * X’ Ht = r * tanh(Ct) Yt = softmax(Ht.W + b) × + × × × σ σ σ
  • 31. @martin_gorner LSTM X = Xt | Ht-1 f = σ(X.Wf + bf) u = σ(X.Wu + bu) r = σ(X.Wr + br) X’ = tanh(X.Wc + bc) Ct = f * Ct-1 + u * X’ Ht = r * tanh(Ct) Yt = softmax(Ht.W + b) tanh tanh σ Xt Ht-1 Ht Yt Ct-1 × + × × σ σ Ct concatenate : forget gate : update gate : result gate : input : new C : new H : output : p+n n n n n n n vector sizes m
  • 32. Gru !
  • 33. @martin_gorner GRU X = Xt | Ht-1 z = σ(X.Wz + bz) r = σ(X.Wr + br) X’ = Xt | r * Ht-1 X” = tanh(X’.Wc + bc) Ht = (1-z) * Ht-1 + z * X” Yt = softmax(Ht.W + b) p+n n n p+n n n vector sizes m GRU = Gated Recurrent Unit GRU Ht Yt Xt Ht-1 2 gates instead of 3 => cheaper Ht
  • 34. @martin_gorner Language model in Tensorflow 0 H5 S t _ J o h t _ J o h n character- based Characters, one-hot encoded
  • 35. @martin_gorner Language model in Tensorflow 0 GRU H0 X0 H0 cells = [tf.nn.rnn_cell.GRUCell(CELLSIZE) for i in range(NLAYERS)] mcell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False) Hr, H = tf.nn.dynamic_rnn(mcell, X, initial_state=Hin) GRU 0 H’0 H’0 GRU 0 H”0 H”0 GRU H1 X1 H0 GRU H’1 H’0 GRU H”1 H”1 GRU H2 X2 H0 GRU H’2 H’0 GRU H”2 H”2 GRU H3 X3 H0 GRU H’3 H’0 GRU H”3 H”3 GRU H5 X4 H0 GRU H’5 H’0 GRU H”5 H”5 GRU H6 X6 H0 GRU H’6 H’0 GRU H”6 H”6 GRU H7 X7 H0 GRU H’7 H’0 GRU H”7 H”7 GRU H8 X8 H0 GRU H’8 H’0 GRU H”8 H”8 H Hin ALPHASIZE = 98 CELLSIZE = 512 NLAYERS = 3 SEQLEN = 30 defines weights and biases internally
  • 36. @martin_gorner Softmax readout layer # Hr Hf = tf.reshape(Hr, [-1, CELLSIZE]) 0 H0 X0 H0 0 H’0 H’0 0 H”0 H”0 H1 X1 H0 H’1 H’0 H”1 H”1 H2 X2 H0 H’2 H’0 H”2 H”2 H3 X3 H0 H’3 H’0 H”3 H”3 H5 X4 H0 H’5 H’0 H”5 H”5 H6 X6 H0 H’6 H’0 H”6 H”6 H7 X7 H0 H’7 H’0 H”7 H”7 H8 X8 H0 H’8 H’0 H”8 H”8 Tip: handle sequence and batch elements the same loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_) [ BATCHSIZE, SEQLEN, CELLSIZE ] [ BATCHSIZE x SEQLEN, CELLSIZE ] ALPHASIZE = 98 CELLSIZE = 512 NLAYERS = 3 SEQLEN = 30 Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Ylogits = tf.layers.dense(Hf, ALPHASIZE) Y = tf.nn.softmax(Ylogits) [ BATCHSIZE x SEQLEN, ALPHASIZE ] [ BATCHSIZE x SEQLEN, ALPHASIZE ]
  • 37. @martin_gorner Inputs and outputs 0 H0 X0 H0 0 H’0 H’0 0 H”0 H1 X1 H0 H’1 H’0 H”1 H2 X2 H0 H’2 H’0 H”2 H3 X3 H0 H’3 H’0 H”3 H5 X4 H0 H’5 H’0 H”5 H6 X6 H0 H’6 H’0 H”6 H7 X7 H0 H’7 H’0 H”7 H8 X8 H0 H’8 H’0 H”8 ALPHASIZE = 98 CELLSIZE = 512 NLAYERS = 3 SEQLEN = 30 Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 S t _ A n d t _ A n d r e r e w [ BATCHSIZE, SEQLEN ] [ BATCHSIZE, SEQLEN, ALPHASIZE ] H: [ BATCHSIZE, CELLSIZE x NLAYERS ]
  • 38. @martin_gorner Placeholders, and the rest... ALPHASIZE = 98 CELLSIZE = 512 NLAYERS = 3 SEQLEN = 30 Xd = tf.placeholder(tf.uint8, [None, None]) X = tf.one_hot(X, ALPHASIZE, 1.0, 0.0) Yd_ = tf.placeholder(tf.uint8, [None, None]) Y_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0) Hin = tf.placeholder(tf.float32, [None, CELLSIZE*NLAYERS]) # Y, loss, Hout = my_model(X, Y_, Hin) predictions = tf.argmax(Y, 1) predictions = tf.reshape(predictions, [batchsize, -1]) train_step = tf.train.AdamOptimizer(1e-3).minimize(loss) [ BATCHSIZE, SEQLEN ] [ BATCHSIZE, SEQLEN, ALPHASIZE ] [ BATCHSIZE, SEQLEN ] [ BATCHSIZE, SEQLEN, ALPHASIZE ] [ BATCHSIZE, CELLSIZE x NLAYERS ] Y: [ BATCHSIZE x SEQLEN, ALPHASIZE ] [ BATCHSIZE x SEQLEN ] [ BATCHSIZE, SEQLEN ]
  • 39. @martin_gorner Bitchin’ batchin’ Ht Ht-1 The quic seventh Mr. Herm Batch 1 k brown heaven o ann Zapf Ht+1 Batch 2 fox jump f typogr was the Ht+ 2 Batch 3 ++ later ++++ later start for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_epochs=10):
  • 40. @martin_gorner Language model in Tensorflow ALPHASIZE = 98 CELLSIZE = 512 NLAYERS = 3 SEQLEN = 30 Xd = tf.placeholder(tf.uint8, [None, None]) X = tf.one_hot(Xd, ALPHASIZE, 1.0, 0.0) Yd_ = tf.placeholder(tf.uint8, [None, None]) Y_ = tf.one_hot(Yd_, ALPHASIZE, 1.0, 0.0) Hin = tf.placeholder(tf.float32, [None, CELLSIZE*NLAYERS]) # the model cell = [tf.nn.rnn_cell.GRUCell(CELLSIZE) for i in range(NLAYERS)] mcell = tf.nn.rnn_cell. MultiRNNCell([cell]*NLAYERS,state_is_tuple=False) Hr,H = tf.nn. dynamic_rnn(mcell, X, initial_state=Hin) # softmax output layer Hf = tf.reshape(Hr, [-1, CELLSIZE]) Ylogits = layers.linear(Hf, ALPHASIZE) Y = tf.nn.softmax(Ylogits) Yp = tf.argmax(Y, 1) Yp = tf.reshape(Yp, [batchsize, -1]) # loss and training step (optimizer) loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_) train_step = tf.train.AdamOptimizer(1e-3).minimize(loss) # training loop for epoch in range(20): inH = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS]) for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_epochs=30): dic = {X: x, Y_: y_, Hin:inH} _,y,outH = sess.run([train_step,Yp,H,], feed_dict=dic) inH = outH The code is on GitHub: github.com/martin-gorner/ tensorflow-rnn-shakespeare
  • 41. @martin_gorner ee o no nonnaoter s ee seih iae r t i r io i ro s sierota tsohoreroneo rsa esia anehereeo hensh rho etnrhhs iti saoitns t et rsearh tshseoeh ta oirhroren e eaetetnesnareeeoaraihss nshtano eter e oooaoaeee nonn is heh easren ieson httn nihensont t e n a ooe oerhi neaeehteriseat tiet i i ntsh orhi e ohhsiea e aht ohr er ra eeo oeeitrot hethisesaaei o saeii straieiteoeresorh e ooeri e ninesh sort a es h rs hattnteseato sonoanr sniaase s rshninsasi na sntennn oti r etnsnrse oh n r e tiathhnaeeano trrr hhohooon rrt eernre e rnoh Shakespeare 0.03 epochs C1
  • 42. @martin_gorner Shakespeare II WERENI Are I I wos the wheer boaer. Tin thim mh cals sate bauut site tar oue tinl an bsisonetoal yer an fimireeren. L[IO SI Hns oret bsllssts aaau ton hete me toer frurtor sheus aed trat A faler bis tote oadt tou than male, tel mou ce an cime. ais fauto ws cien whus yas. Ande fert te a ut wond aal sinr be at saar 0.1 epochs C3
  • 43. @martin_gorner BERENS Hall hat in she the hir meres. Perstr in ame not of heard, me thin hild of shear and ant on of mare. I lore wes lour. DOCHES The chaster'd on not fenst The laldoos more. [Ixeln thrish] And tho priines sith of hamdeling the san wind Shakespeare 0.2 epochs C5 Stage directions ?
  • 44. @martin_gorner KING LEAR Alas, I am not forsworn both to bod! And let the firm I have to'st trainoured. KING HENRY VIII I love not my father. PORDIA He tash you will have it. HENRY BLUTIUS Work, thou lovest my son here, thy father's fath! CLIOND Why, then, would say, the beasts are Shakespeare 1 epoch C6 Invented names !
  • 45. @martin_gorner Shakespeare 30 epochs TITUS ANDRONICUS ACT I SCENE III An ante-chamber. The COUNT's palace. [Enter CLEOMENES, with the Lord SAY] Chamberlain Let me see your worshing in my hands. LUCETTA I am a sign of me, and sorrow sounds it. B10
  • 46. @martin_gorner Shakespeare 30 epochs And sorrow far into the stars of men, Without a second tears to seek the best and bed, With a strange service, and the foul prince of Rome [Exeunt MARK ANTONY and LEPIDUS] Well said, my lord,-- MENENIUS I do not say so. Well, I will not have no better ways; B10
  • 47. @martin_gorner diassts_= =tlns==eti.s=tessn_(( sie_s_nts_ens= dondtnenroe dnar taonte srst anttntoilonttiteaen detrtstinsenoaolsesnesoairt( arssserleeeerltrdlesssoeeslslrlslie(e drnnaleeretteaelreesioe niennoarens dssnstssaorns sreeoeslrteasntotnnai(ar dsopelntederlalesdanserl lts(sitae(e) Python code 0.03 epochs A1
  • 48. @martin_gorner with self.essors_sigeater(output_dits_allss, self._train. for sampated to than ubtexsormations. expeddions = np.randim(natched_collection, ranger, mang_ops, samplering) def assestErrorume_gens(assignex) as and(sampled_veases): eved. Python code 0.1 epochs A2 Python keywords
  • 49. @martin_gorner def testGiddenSelfBeShareMecress(self): with self.test_session() as sess: tat = tf.contrib.matrix.cast_column_variable([1, 1], [0, 1, 1], [1, 7]], [[1, 1, 1]].file(file, line_state_will_file)) with self.test_session(): self.assertAllEqual(1, l.ex6) self.assertEqual(output_graph_def is_output_tensors_op( tf.pro_context_name.sqrt(sess) def test_shape(self): res = values=value_rns[0].eval()) def tempDimpleSeriesGredicsIothasedWouthAverageData(self): self._testDirector(self): self._test_inv3_size = 5 with tf.train.ConvolutioBailLors_startswith("save_dir_context.PutIsprint().eval()) return tf.contrib.learn.RUCISLCCS: # Check the orfloating so that the nimesting object mumputable othersifier. # dense_keys.tokens_prefix/statch_size of the input1 tensors. @property Python code 0.4 epochs A3 Wrong ([]) nesting Correct use of colons: Hallucinated function names
  • 50. @martin_gorner # Copyright 2015 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in [0.1, 2.0, 3.0]] def __init__(self, expected): return np.array([[0, 0, 0], [0, 0, 0]]) self.assertAllEqual(tf.placeholder(tf.float32, shape=(3, 3)),(shape, prior.pack(), tf.float32)) for keys in tensor_list: return np.array([[0, 0, 0]]).astype(np.float32) # Check that we have both scalar tensor for being invalid to a vector of 1 indicating # the total loss of the same shape as the shape of the tensor. sharded_weights = [[0.0, 1.0]] # Create the string op to apply gradient terms that also batch. # The original any operation as a code when we should alw infer to the session case. Python code 12 epochs B10 Correct triple ([]) nesting Recites Apache license Tensorflow tips!
  • 51. @martin_gorner ...and more Credit to Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks
  • 52. @martin_gorner Tensorflow: save, restore saver = tf.train.Saver(keep_checkpoint_every_n_hours=0.1, max_to_keep=5) with tf.Session() as sess: # ... training loop ... saver.save(sess, 'file_' , global_step=iter) => Save variables in , the graph in file_200 file_200.meta with tf.Session() as sess: resto = tf.train.import_meta_graph('file_200.meta') resto.restore(sess, 'file_200') => Restore graph and variable values Must name variables explicitly !!! # when saving X = tf.placeholder(tf.uint8, name='X') Y = tf.nn.softmax(Ylogits, name='Y') # when using restored graph y,h = sess.run(['Y:0', 'H:0'], feed_dict={'X:0': y} )
  • 53. @martin_gorner Shakespeare generation with tf.Session() as sess: resto = tf.train.import_meta_graph('shake_200.meta') resto.restore(sess, 'shake_200') # initial values x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1 h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32) for i in range(100000): dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1} y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic) c = my_txtutils.sample_from_probabilities(y, topn=5) x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1 print(chr(my_txtutils.convert_to_ascii(c)), end="") X Ht H’0 Y Ht-1 One char at a time
  • 54. @martin_gorner Tensorboard summary_writer = tf.train.SummaryWriter("log/train_" + time) loss_summary = tf.scalar_summary("batch_loss", loss) # in training loop: smm = sess.run(summaries, feed_dict=dic) summary_writer.add_summary(smm, iteration) Tip: use time in logdir name Tip: use a second SummaryWriter for validation results
  • 55. @martin_gorner RNN shapes 0 H5 S t _ J o h t _ J o h n character- based Characters, one-hot encoded
  • 56. @martin_gorner RNN shapes 0 The USA and China have agreed geopolitics Words encoded as vectors: “embeddings” Text classification embeddings = tf.Variable(tf.random_uniform([vocab_size, embed_size])) X = tf.nn.embedding_lookup(embeddings, train_inputs) Tensorflow sample: goo.gl/m41mNp Or constant => see Word2Vec
  • 57. @martin_gorner Bitchin’ batchin’ China and the USA have agreed to a new round of talks 12 The quick brown fox jumps over the lazy dog . 10 Boys will be boys . 5 Tom , get your coat . We are going out . 11 Math rules the world . Men rule math . 9 0 Hr, H = tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen) Hn ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ geopolitics seq len
  • 58. @martin_gorner RNN shapes The red cat ate the mouse Words encoded as vectors Text translation 0 Le chat rouge a mangé la souris ∅ ∅ Le chat rouge a mangé la souris Tensorflow sample: goo.gl/KyKLDv tf.nn.sampled_softmax_loss(…) slow fast
  • 59. @martin_gorner RNN shapes Images encoded as vectors Image captioning (simplified) A man on a beach flying a ∅ kite A man on a beach flying a 0 ∅ kite Google’s neural net for image captioning: goo.gl/VgZUQZ For ex. output of convolutional network or auto-encoder
  • 60. @martin_gorner Image captioning Google’s neural net for image captioning: goo.gl/VgZUQZ A person riding a motorcycle on a dirt road. A herd of elephants walking across a dry grass field.
  • 61. @martin_gorner Image captioning Google’s neural net for image captioning: goo.gl/VgZUQZ A refrigerator filled with lots of food and drinks. A yellow school bus parked in a parking lot.
  • 63. @martin_gorner Data-parallel distributed training parameter servers model replicas data W’ = W + ∆W asynchronous updates I ♡ noise
  • 64. @martin_gorner TF high level API from tensorflow.contrib import learn def model_fn(X, Y_, mode): Yn = … # model layers predictions = {"probabilities": …, "digits": …} #free-form evaluations = {'accuracy': metrics.accuracy(…)} #free-form loss = … train = layers.optimize_loss(loss, …) return learn.ModelFnOps(mode, predictions,loss,train,evaluations) “features” and “targets Samples: goo.gl/F3i3bf, goo.gl/CofxFM
  • 65. @martin_gorner Estimator, Experiment, learn_runner from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils def experiment_fn(job_dir): return learn.Experiment( estimator=learn.Estimator(model_fn, model_dir=job_dir, config=learn.RunConfig(save_checkpoints_secs=None, save_checkpoints_steps=1000)), train_input_fn=…, # data feed eval_input_fn=…, # data feed train_steps=10000, eval_steps=1, export_strategies=make_export_strategy(export_input_fn= serving_input_fn)) def main(argv=None): job_dir = # parse argument --job-dir learn_runner.run(experiment_fn, job_dir) if __name__ == '__main__': main() Free stuff !!! Tensorboard graphs Resume on fail Parallel data feeds Serving model export Distributed training trainingInput: scaleTier: STANDARD_1 Samples: goo.gl/F3i3bf, goo.gl/CofxFM
  • 66. @martin_gorner Data queues for distributed training # dummy implementation for data that fits in memory def train_data_input_fn(mnist): images = tf.constant(mnist.train.images) labels = tf.constant(mnist.train.labels) return tf.train.shuffle_batch([images, labels], 100, 1100, 1000, enqueue_many=True) # dummy implementation for data that fits in memory def eval_data_input_fn(mnist): return tf.constant(mnist.test.images), tf.constant(mnist.test.labels) Inserts queue nodes Into TF graph For practical data queuing use the TF Records format batch size trainingInput: scaleTier: STANDARD_1 Samples: goo.gl/F3i3bf, goo.gl/CofxFM
  • 67. @martin_gorner Serving input function # Online predictions on Cloud ML Engine def serving_input_fn(): # Placeholder for data deserialised from JSON inputs = {'A': tf.placeholder(tf.uint8, [None, 28, 28])} # Transform the data as needed features = [tf.cast(inputs['A'], tf.float32)] return input_fn_utils.InputFnOps(features, None, inputs) trainingInput: scaleTier: STANDARD_1 Batch of images For MNIST Samples: goo.gl/F3i3bf, goo.gl/CofxFM
  • 68. @martin_gorner Run it gcloud ml-engine jobs submit training job22 --job-dir=gs://mybucket/job22 --package-path=trainer --module-name=trainer.task --config=config.yaml -- --<custom model arguments here> Deploy trained model to prod = click click click gcloud ml-engine predict --model <model_name> --json-instances mydigits.json model checkpoints tensorboard summaries trainingInput: scaleTier: STANDARD_1 autoscaled serving Samples: goo.gl/F3i3bf, goo.gl/CofxFM
  • 70. @martin_gorner Cloud ML Engine your TensorFlow models trained in Google’s cloud. Pre-trained models: That’s all folks... Martin Görner Google Developer relations @martin_gorner Cloud Vision API Cloud Speech API Google Translate API Natural Language API Video Intelligence API Cloud Jobs API PRIVATE BETA Cloud Auto ML VisionALPHA Just bring your data Cloud TPU BETA ML supercomputing Videos, slides, code: github.com/ GoogleCloudPlatform/ tensorflow-without-a-phd Have fun !