9. @martin_gorner
Batch normalisation
Center and re-scale logits
before the activation function
(decorrelate ? no, too complex)
Compute average and
variance on mini-batch
Add learnable scale and offset
for each logit so as to restore expressiveness
“logit” = weighted sum + bias
one of each
per neuron
Try α=stdev(x) and β=avg(x) and you have BN(x) = x
10. @martin_gorner
Batch normalisation
depends from:
weights, biases, images
depends from:
same weights and biases, images
only one set of weights and biases in a mini-batch
=> BN is differentiable relatively to weights, biases, α and β
It can be used as a layer in the network, gradient calculations will still work
Batch-norm α, β
x =
weighted
sum + bias
activation
fn
13. @martin_gorner
Batch normalisation done right
Batch-norm α, β
x =
weighted
sum + b
activation
fn
biases :
no longer useful
when activation fn is RELU
α is not useful
It does not modify output distrib.
Per
neuron:
relu sigmoid
without
BN
bias bias
With
BN
β α, β
+You can go faster: use higher learning rate
+BN also regularises: lower or remove dropout
14. @martin_gorner
Convolutional batch normalisation
W1[4, 4, 3]
W2[4, 4, 3]
Each neuron or patch has a value:
● per image in the batch
● per x position
● per y position
=> compute avg and stdev across all
batchsize x width x height values
b1 α1 β1
b2 α2 β2
Still, one bias,
scale or offset
per neuron
15. @martin_gorner
Batch normalisation at test time
Stats on what ?
● Last batch: no
● all images: yes (but not practical)
● => Exponential moving average during training
16. @martin_gorner
Batch normalisation with Tensorflow
def batchnorm_layer(Ylogits, is_test, Offset, Scale, iteration, convolutional=False):
exp_moving_avg = tf.train.ExponentialMovingAverage(0.9999, iteration)
if convolutional: # avg across batch, width, height
mean, variance = tf.nn.moments(Ylogits, [0, 1, 2])
else:
mean, variance = tf.nn.moments(Ylogits, [0])
update_moving_averages = exp_moving_avg.apply([mean, variance])
m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean)
v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance)
Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, variance_epsilon=1e-5)
return Ybn, update_moving_averages
Define one offset and/or
scale per neuron
apply activation fn on Ybn
don’t forget to execute this (sess.run)
The code is on GitHub: goo.gl/DEOe7Z
20. @martin_gorner
Layers
from tensorflow.contrib import layers
# this
Y = layers.relu(X, 200)
# instead of this
W = tf.Variable(tf.zeros([784, 200]))
b = tf.Variable(tf.zeros([200]))
Y = tf.nn.relu(tf.matmul(X,W) + b)
Sample: goo.gl/y1SSFy
21. @martin_gorner
Model function
from tensorflow.contrib import learn, layers, metrics
def model_fn(X, Y_, mode):
Yn = … # model layers
prob = tf.nn.softmax(Yn)
digi = tf.argmax(prob, 1)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(…)
train = layers.optimize_loss(loss,framework.get_global_step(), 0.003,"Adam")
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets“
learning rate
TRAIN, EVAL
or INFER
Sample: goo.gl/y1SSFy
29. @martin_gorner
Michel C. was born in Paris, France. He is married and has three children. He received a M.S.
in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987,
and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He
specialized in child and adolescent psychiatry and his first field of research was severe mood disorders
in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ?
Long term dependencies: a problem
Short context
English,
German,
Russian,
French …
Long context Problems…
Hn
…
Michel C. was born in
French
…
Hn-1
30. @martin_gorner
LSTM
LSTM = Long Short Term Memory
tanh
tanh
σ
Xt
Ht-1 Ht
Yt
Ct
Ct-1
concatenation
Element-wise operations
tanh
tanh Neural net. layers
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
Yt = softmax(Ht.W + b)
× +
× ×
×
σ σ
σ
31. @martin_gorner
LSTM
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
Yt = softmax(Ht.W + b)
tanh
tanh
σ
Xt
Ht-1 Ht
Yt
Ct-1
× +
× ×
σ σ
Ct
concatenate :
forget gate :
update gate :
result gate :
input :
new C :
new H :
output :
p+n
n
n
n
n
n
n
vector sizes
m
39. @martin_gorner
Bitchin’ batchin’
Ht
Ht-1
The quic
seventh
Mr. Herm
Batch 1
k brown
heaven o
ann Zapf
Ht+1
Batch 2
fox jump
f typogr
was the
Ht+
2
Batch 3
++
later
++++
later
start
for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN,
nb_epochs=10):
40. @martin_gorner
Language model in Tensorflow
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(Xd, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Yd_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None,
CELLSIZE*NLAYERS])
# the model
cell = [tf.nn.rnn_cell.GRUCell(CELLSIZE)
for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.
MultiRNNCell([cell]*NLAYERS,state_is_tuple=False)
Hr,H = tf.nn.
dynamic_rnn(mcell, X,
initial_state=Hin)
# softmax output layer
Hf = tf.reshape(Hr, [-1, CELLSIZE])
Ylogits = layers.linear(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
Yp = tf.argmax(Y, 1)
Yp = tf.reshape(Yp, [batchsize, -1])
# loss and training step (optimizer)
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
# training loop
for epoch in range(20):
inH = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])
for x, y_ in utils.rnn_minibatch_sequencer(codetext,
BATCHSIZE, SEQLEN, nb_epochs=30):
dic = {X: x, Y_: y_, Hin:inH}
_,y,outH = sess.run([train_step,Yp,H,], feed_dict=dic)
inH = outH
The code is on GitHub:
github.com/martin-gorner/
tensorflow-rnn-shakespeare
41. @martin_gorner
ee o no nonnaoter s ee seih iae r t i r io i ro s
sierota tsohoreroneo rsa esia anehereeo hensh
rho etnrhhs iti saoitns t et rsearh tshseoeh ta
oirhroren e eaetetnesnareeeoaraihss nshtano eter
e oooaoaeee nonn is heh easren ieson httn nihensont
t e n a ooe oerhi neaeehteriseat tiet i i ntsh
orhi e ohhsiea e aht ohr er ra eeo oeeitrot
hethisesaaei o saeii straieiteoeresorh e ooeri
e ninesh sort a es h rs hattnteseato sonoanr sniaase
s rshninsasi na sntennn oti r etnsnrse oh n
r e tiathhnaeeano trrr hhohooon rrt eernre e rnoh
Shakespeare
0.03
epochs
C1
42. @martin_gorner
Shakespeare
II WERENI
Are I I wos the wheer boaer.
Tin thim mh cals sate bauut site tar oue tinl
an bsisonetoal yer an fimireeren.
L[IO SI Hns oret bsllssts aaau ton hete me toer
frurtor sheus aed trat
A faler bis tote oadt tou than male, tel mou ce
an cime. ais fauto ws cien whus yas. Ande fert te a
ut wond aal sinr be at saar
0.1
epochs
C3
43. @martin_gorner
BERENS Hall hat in she the hir meres.
Perstr in ame not of heard, me thin hild of shear and
ant on of mare. I lore wes lour.
DOCHES The chaster'd on not fenst
The laldoos more.
[Ixeln thrish]
And tho priines sith of hamdeling the san wind
Shakespeare
0.2
epochs
C5
Stage directions ?
44. @martin_gorner
KING LEAR Alas, I am not forsworn both to bod!
And let the firm I have to'st trainoured.
KING HENRY VIII I love not my father.
PORDIA He tash you will have it.
HENRY BLUTIUS Work, thou lovest my son here,
thy father's fath!
CLIOND Why, then, would say, the beasts are
Shakespeare
1
epoch
C6
Invented
names !
46. @martin_gorner
Shakespeare
30
epochs
And sorrow far into the stars of men,
Without a second tears to seek the best and
bed,
With a strange service, and the foul prince of
Rome
[Exeunt MARK ANTONY and LEPIDUS]
Well said, my lord,--
MENENIUS I do not say so.
Well, I will not have no better ways;
B10
49. @martin_gorner
def testGiddenSelfBeShareMecress(self):
with self.test_session() as sess:
tat = tf.contrib.matrix.cast_column_variable([1, 1], [0, 1, 1], [1, 7]],
[[1, 1, 1]].file(file, line_state_will_file))
with self.test_session():
self.assertAllEqual(1, l.ex6)
self.assertEqual(output_graph_def is_output_tensors_op(
tf.pro_context_name.sqrt(sess)
def test_shape(self):
res = values=value_rns[0].eval())
def tempDimpleSeriesGredicsIothasedWouthAverageData(self):
self._testDirector(self):
self._test_inv3_size = 5
with tf.train.ConvolutioBailLors_startswith("save_dir_context.PutIsprint().eval())
return tf.contrib.learn.RUCISLCCS:
# Check the orfloating so that the nimesting object mumputable othersifier.
# dense_keys.tokens_prefix/statch_size of the input1 tensors.
@property
Python code
0.4
epochs
A3
Wrong
([])
nesting
Correct
use of
colons:
Hallucinated
function
names
50. @martin_gorner
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in [0.1, 2.0, 3.0]]
def __init__(self, expected):
return np.array([[0, 0, 0], [0, 0, 0]])
self.assertAllEqual(tf.placeholder(tf.float32, shape=(3, 3)),(shape, prior.pack(),
tf.float32))
for keys in tensor_list:
return np.array([[0, 0, 0]]).astype(np.float32)
# Check that we have both scalar tensor for being invalid to a vector of 1 indicating
# the total loss of the same shape as the shape of the tensor.
sharded_weights = [[0.0, 1.0]]
# Create the string op to apply gradient terms that also batch.
# The original any operation as a code when we should alw infer to the session case.
Python code
12
epochs
B10
Correct triple ([]) nesting
Recites
Apache
license
Tensorflow
tips!
52. @martin_gorner
Tensorflow: save, restore
saver = tf.train.Saver(keep_checkpoint_every_n_hours=0.1, max_to_keep=5)
with tf.Session() as sess:
# ... training loop ...
saver.save(sess, 'file_' , global_step=iter)
=> Save variables in , the graph in
file_200 file_200.meta
with tf.Session() as sess:
resto = tf.train.import_meta_graph('file_200.meta')
resto.restore(sess, 'file_200')
=> Restore graph and variable values
Must name variables explicitly !!!
# when saving
X = tf.placeholder(tf.uint8, name='X')
Y = tf.nn.softmax(Ylogits, name='Y')
# when using restored graph
y,h = sess.run(['Y:0', 'H:0'],
feed_dict={'X:0': y} )
53. @martin_gorner
Shakespeare generation
with tf.Session() as sess:
resto = tf.train.import_meta_graph('shake_200.meta')
resto.restore(sess, 'shake_200')
# initial values
x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32)
for i in range(100000):
dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1}
y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic)
c = my_txtutils.sample_from_probabilities(y, topn=5)
x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
print(chr(my_txtutils.convert_to_ascii(c)), end="")
X
Ht
H’0
Y
Ht-1
One char
at a time
54. @martin_gorner
Tensorboard
summary_writer = tf.train.SummaryWriter("log/train_" + time)
loss_summary = tf.scalar_summary("batch_loss", loss)
# in training loop:
smm = sess.run(summaries, feed_dict=dic)
summary_writer.add_summary(smm, iteration)
Tip: use time in
logdir name
Tip: use a second
SummaryWriter for
validation results
56. @martin_gorner
RNN shapes
0
The USA and China have agreed
geopolitics
Words encoded as
vectors: “embeddings”
Text
classification
embeddings = tf.Variable(tf.random_uniform([vocab_size, embed_size]))
X = tf.nn.embedding_lookup(embeddings, train_inputs)
Tensorflow sample: goo.gl/m41mNp
Or constant => see Word2Vec
57. @martin_gorner
Bitchin’ batchin’
China and the USA have agreed to a new round of talks 12
The quick brown fox jumps over the lazy dog . 10
Boys will be boys . 5
Tom , get your coat . We are going out . 11
Math rules the world . Men rule math . 9
0
Hr, H =
tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen)
Hn
∅ ∅ ∅ ∅ ∅ ∅ ∅
∅ ∅
∅
∅ ∅ ∅
geopolitics
seq
len
58. @martin_gorner
RNN shapes
The red cat ate the mouse
Words encoded
as vectors Text
translation
0
Le chat rouge a mangé la souris
∅
∅
Le chat rouge a mangé la souris
Tensorflow sample: goo.gl/KyKLDv
tf.nn.sampled_softmax_loss(…)
slow
fast
59. @martin_gorner
RNN shapes
Images encoded
as vectors
Image captioning
(simplified)
A man on a beach flying a
∅
kite
A man on a beach flying a
0
∅
kite
Google’s neural net for image captioning: goo.gl/VgZUQZ
For ex. output
of convolutional network or auto-encoder
64. @martin_gorner
TF high level API
from tensorflow.contrib import learn
def model_fn(X, Y_, mode):
Yn = … # model layers
predictions = {"probabilities": …, "digits": …} #free-form
evaluations = {'accuracy': metrics.accuracy(…)} #free-form
loss = …
train = layers.optimize_loss(loss, …)
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
65. @martin_gorner
Estimator, Experiment, learn_runner
from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils
def experiment_fn(job_dir):
return learn.Experiment(
estimator=learn.Estimator(model_fn, model_dir=job_dir,
config=learn.RunConfig(save_checkpoints_secs=None,
save_checkpoints_steps=1000)),
train_input_fn=…, # data feed
eval_input_fn=…, # data feed
train_steps=10000,
eval_steps=1,
export_strategies=make_export_strategy(export_input_fn=
serving_input_fn))
def main(argv=None):
job_dir = # parse argument --job-dir
learn_runner.run(experiment_fn, job_dir)
if __name__ == '__main__': main()
Free stuff !!!
Tensorboard graphs
Resume on fail
Parallel data feeds
Serving model export
Distributed training
trainingInput:
scaleTier: STANDARD_1
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
66. @martin_gorner
Data queues for distributed training
# dummy implementation for data that fits in memory
def train_data_input_fn(mnist):
images = tf.constant(mnist.train.images)
labels = tf.constant(mnist.train.labels)
return tf.train.shuffle_batch([images, labels], 100,
1100, 1000, enqueue_many=True)
# dummy implementation for data that fits in memory
def eval_data_input_fn(mnist):
return tf.constant(mnist.test.images),
tf.constant(mnist.test.labels)
Inserts queue nodes
Into TF graph
For practical data
queuing use the
TF Records format
batch size
trainingInput:
scaleTier: STANDARD_1
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
67. @martin_gorner
Serving input function
# Online predictions on Cloud ML Engine
def serving_input_fn():
# Placeholder for data deserialised from JSON
inputs = {'A': tf.placeholder(tf.uint8, [None, 28, 28])}
# Transform the data as needed
features = [tf.cast(inputs['A'], tf.float32)]
return input_fn_utils.InputFnOps(features, None, inputs)
trainingInput:
scaleTier: STANDARD_1
Batch of images
For MNIST
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
68. @martin_gorner
Run it
gcloud ml-engine jobs submit training job22
--job-dir=gs://mybucket/job22
--package-path=trainer
--module-name=trainer.task
--config=config.yaml
--
--<custom model arguments here>
Deploy trained model to prod = click click click
gcloud ml-engine predict
--model <model_name>
--json-instances mydigits.json
model checkpoints
tensorboard
summaries
trainingInput:
scaleTier: STANDARD_1
autoscaled
serving
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
70. @martin_gorner
Cloud ML Engine
your TensorFlow models
trained in Google’s cloud.
Pre-trained models:
That’s all
folks...
Martin Görner
Google Developer relations
@martin_gorner
Cloud Vision API
Cloud Speech API
Google Translate API
Natural Language API
Video Intelligence API
Cloud Jobs API PRIVATE BETA
Cloud Auto ML VisionALPHA
Just bring your data
Cloud TPU BETA
ML supercomputing
Videos, slides, code:
github.com/
GoogleCloudPlatform/
tensorflow-without-a-phd
Have fun !