TensorFlow and Deep Learning
without a PhD
Lucio Floretta
CODEMOTION MILAN - SPECIAL EDITION
10 – 11 NOVEMBER 2017
?
MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/
Hello World: handwritten digits classification - MNIST
Very simple model: softmax classification
28x28
pixels
softmax
...
...
0 1 2 9
Logit := weighted sum
of all pixels + bias
neuron outputs
784 pixels
+ b0
b1
b2
b3
… b9
L0,0
L0,1
L0,2
L0,3
… L0,9
L0,0
L1,0
L1,1
L1,2
L1,3
… L1,9
L2,0
L2,1
L2,2
L2,3
… L2,9
L3,0
L3,1
L3,2
L3,3
… L3,9
…
L99,0
L99,1
L99,2
… L99,9
x
x
x
x
x
x
x
x
w0,0
w0,1
w0,2
w0,3
… w0,9
w1,0
w1,1
w1,2
w1,3
… w1,9
w2,0
w2,1
w2,2
w2,3
… w2,9
w3,0
w3,1
w3,2
w3,3
… w3,9
w4,0
w4,1
w4,2
w4,3
… w4,9
w5,0
w5,1
w5,2
w5,3
… w5,9
w6,0
w6,1
w6,2
w6,3
… w6,9
w7,0
w7,1
w7,2
w7,3
… w7,9
w8,0
w8,1
w8,2
w8,3
… w8,9
…
w783,0
w783,1
w783,2
… w783,9
10 columns
784lines
broadcast
In matrix notation, 100 images at a time
784 pixels
X : 100 images,
one per line,
flattened x
x
+ Same 10 biases
on all lines
Softmax, on a batch of images
Predictions Images Weights Biases
Y[100, 10] X[100, 784] W[784,10] b[10]
matrix multiply
broadcast
on all lines
applied line
by line
tensor shapes in [ ]
Now in TensorFlow (Python)
Y = tf.nn.softmax(tf.matmul(X, W) + b)
tensor shapes: X[100, 784] W[748,10] b[10]
matrix multiply
broadcast
on all lines
Success?
Cross entropy:
Y := computed probabilities
Y_ := actual probabilities, “one-hot” encoded
0 0 0 0 0 0 1 0 0 0
0 1 2 3 4 5 6 7 8 9
0.02 0.01 0.01 0.11 0.02 0.01 0.78 0.01 0.01 0.01
0 1 2 3 4 5 6 7 8 9
this is a “6”
Demo
92%
TensorFlow - initialisation
import tensorflow as tf
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
init = tf.initialize_all_variables()
this will become the batch size
28 x 28 grayscale images
Training = computing variables W and b
# model
Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
TensorFlow - success metrics
“one-hot” encoded
flattening images
TensorFlow - training
optimizer = tf.train.GradientDescentOptimizer(0.005)
train_step = optimizer.minimize(cross_entropy)
learning rate
loss function
sess = tf.Session()
sess.run(init)
for i in range(1000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}
# train
sess.run(train_step, feed_dict=train_data)
TensorFlow - run !
running a Tensorflow
computation, feeding
placeholders
import tensorflow as tf
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
init = tf.initialize_all_variables()
# model
Y=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
TensorFlow - full python code
optimizer = tf.train.GradientDescentOptimizer(0.005)
train_step = optimizer.minimize(cross_entropy)
sess = tf.Session()
sess.run(init)
for i in range(10000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}
# train
sess.run(train_step, feed_dict=train_data)
initialisation
model
success metrics
training step
Run
92%
|
|
Go deep !
||
Let’s try 5 fully-connected layers !
overkill
;-)
9...0 1 2
REctified
Linear
Unit
softmax
200
100
60
10
30
784
TensorFlow - initialisation
K = 200
L = 100
M = 60
N = 30
W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))
B1 = tf.Variable(tf.zeros([K]))
W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))
B2 = tf.Variable(tf.zeros([L]))
W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))
B3 = tf.Variable(tf.zeros([M]))
W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))
B4 = tf.Variable(tf.zeros([N]))
W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))
B5 = tf.Variable(tf.zeros([10]))
weights initialised
with random values
TensorFlow - the model
X = tf.reshape(X, [-1, 28*28])
Y1 = tf.nn.relu(tf.matmul(X, W1) + B1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + B2)
Y3 = tf.nn.relu(tf.matmul(Y2, W3) + B3)
Y4 = tf.nn.relu(tf.matmul(Y3, W4) + B4)
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)
weights and biases
98%
Noisy accuracy curve ?
yuck!
Slow down . . .
Learning
rate decay
98%
Overfitting
Dropout
Dropout
TRAINING
pkeep=0.75
EVALUATION
pkeep=1
pkeep = tf.placeholder(tf.float32)
Yf = tf.nn.relu(tf.matmul(X, W) + B)
Y = tf.nn.dropout(Yf, pkeep)
98%
All the party tricks
Learning rate 0.005Decaying learning rate 0.005 -> 0.0001Decaying learning rate 0.005 -> 0.0001 and dropout 0.75
Too many
neurons
Overfitting
?!?
Not
enough
DATA
BAD
Network
W1
[4, 4, 3]
W2
[4, 4, 3]
+padding
W[4, 4, 3, 2]
filter
size
input
channels
output
channels
stride
convolutional
subsampling
convolutional
subsampling
convolutional
subsampling
Convolutional layer
Hacker’s tip
ALL
Convolu-
tional
Convolutional neural network
convolutional layer, 4 channels
W1[5, 5, 1, 4] stride 1
convolutional layer, 8 channels
W2[4, 4, 4, 8] stride 2
convolutional layer, 12 channels
W3[4, 4, 8, 12] stride 2
28x28x1
28x28x4
14x14x8
200
7x7x12
10
fully connected layer W4[7x7x12, 200]
softmax readout layer W5[200, 10]
+ biases on
all layers
Tensorflow - initialisation
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 4] ,stddev=0.1))
B1 = tf.Variable(tf.ones([4])/10)
W2 = tf.Variable(tf.truncated_normal([5, 5, 4, 8] ,stddev=0.1))
B2 = tf.Variable(tf.ones([8])/10)
W3 = tf.Variable(tf.truncated_normal([4, 4, 8, 12] ,stddev=0.1))
B3 = tf.Variable(tf.ones([12])/10)
W4 = tf.Variable(tf.truncated_normal([7*7*12, 200] ,stddev=0.1))
B4 = tf.Variable(tf.ones([200])/10)
W5 = tf.Variable(tf.truncated_normal([200, 10] ,stddev=0.1))
B5 = tf.Variable(tf.zeros([10])/10)
filter
size
input
channels
output
channels
weights initialised
with random values
Tensorflow - the model
Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)
Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)
Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3)
YY = tf.reshape(Y3, shape=[-1, 7 * 7 * 12])
Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4)
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)
weights biasesstride
flatten all values for
fully connected layer
input image batch
X[100, 28, 28, 1]
Y3 [100, 7, 7, 12]
YY [100, 7x7x12]
99.1%
WTFH ???
???
Bigger convolutional network + dropout
convolutional layer, 12 channels
W2[5, 5, 6, 12] stride 2
convolutional layer, 12 channels
W2[5, 5, 6, 12] stride 2
convolutional layer, 6 channels
W1[6, 6, 1, 6] stride 1
convolutional layer, 24 channels
W3[4, 4, 12, 24] stride 2
convolutional layer, 24 channels
W3[4, 4, 12, 24] stride 2
convolutional layer, 6 channels
W1[6, 6, 1, 6] stride 1
28x28x1
28x28x6
14x14x12
200
7x7x24
10
fully connected layer W4[7x7x24, 200]
softmax readout layer W5[200, 10]
+ biases on
all layers
+DROPOUT
p=0.75
99.3%
YEAH !
with dropout
Martin Görner
Google Developer relations
@martin_gorner
plus.google.com/+MartinGorner
goo.gl/pHeXe7
youtu.be/qyvlt7kiQoI
goo.gl/mVZloU
github.com/martin-gorner/
tensorflow-mnist-tutorial
goo.gl/UuN41S
youtu.be/vq2nnJ4g6N0
github.com/martin-gorner/
tensorflow-rnn-shakespeare
Where to go next
Cartoon images copyright:
alexpokusay / 123RF stock photos
TensorFlow : ML for Everyone
It scales from research to production It supports many platforms
Internal launch
Search, Gmail, Translate, Maps, Android,
Photos, Speech, YouTube, Play and many others
+
100s of research projects and papers
iOS
Android
TPU
GPU
CPU Compute Engine
Cloud Machine
Learning Engine
Cloud Vision API
Cloud Speech API
Natural Language API
Google Translate API
Video Intelligence API
Cloud Jobs API
PRIVATE BETA
PRIVATE ALPHA
Thank You.

Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

  • 1.
    TensorFlow and DeepLearning without a PhD Lucio Floretta CODEMOTION MILAN - SPECIAL EDITION 10 – 11 NOVEMBER 2017
  • 2.
    ? MNIST = MixedNational Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/ Hello World: handwritten digits classification - MNIST
  • 3.
    Very simple model:softmax classification 28x28 pixels softmax ... ... 0 1 2 9 Logit := weighted sum of all pixels + bias neuron outputs 784 pixels
  • 4.
    + b0 b1 b2 b3 … b9 L0,0 L0,1 L0,2 L0,3 …L0,9 L0,0 L1,0 L1,1 L1,2 L1,3 … L1,9 L2,0 L2,1 L2,2 L2,3 … L2,9 L3,0 L3,1 L3,2 L3,3 … L3,9 … L99,0 L99,1 L99,2 … L99,9 x x x x x x x x w0,0 w0,1 w0,2 w0,3 … w0,9 w1,0 w1,1 w1,2 w1,3 … w1,9 w2,0 w2,1 w2,2 w2,3 … w2,9 w3,0 w3,1 w3,2 w3,3 … w3,9 w4,0 w4,1 w4,2 w4,3 … w4,9 w5,0 w5,1 w5,2 w5,3 … w5,9 w6,0 w6,1 w6,2 w6,3 … w6,9 w7,0 w7,1 w7,2 w7,3 … w7,9 w8,0 w8,1 w8,2 w8,3 … w8,9 … w783,0 w783,1 w783,2 … w783,9 10 columns 784lines broadcast In matrix notation, 100 images at a time 784 pixels X : 100 images, one per line, flattened x x + Same 10 biases on all lines
  • 5.
    Softmax, on abatch of images Predictions Images Weights Biases Y[100, 10] X[100, 784] W[784,10] b[10] matrix multiply broadcast on all lines applied line by line tensor shapes in [ ]
  • 6.
    Now in TensorFlow(Python) Y = tf.nn.softmax(tf.matmul(X, W) + b) tensor shapes: X[100, 784] W[748,10] b[10] matrix multiply broadcast on all lines
  • 7.
    Success? Cross entropy: Y :=computed probabilities Y_ := actual probabilities, “one-hot” encoded 0 0 0 0 0 0 1 0 0 0 0 1 2 3 4 5 6 7 8 9 0.02 0.01 0.01 0.11 0.02 0.01 0.78 0.01 0.01 0.01 0 1 2 3 4 5 6 7 8 9 this is a “6”
  • 8.
  • 9.
  • 10.
    TensorFlow - initialisation importtensorflow as tf X = tf.placeholder(tf.float32, [None, 28, 28, 1]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) init = tf.initialize_all_variables() this will become the batch size 28 x 28 grayscale images Training = computing variables W and b
  • 11.
    # model Y =tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b) # placeholder for correct answers Y_ = tf.placeholder(tf.float32, [None, 10]) # loss function cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y)) TensorFlow - success metrics “one-hot” encoded flattening images
  • 12.
    TensorFlow - training optimizer= tf.train.GradientDescentOptimizer(0.005) train_step = optimizer.minimize(cross_entropy) learning rate loss function
  • 13.
    sess = tf.Session() sess.run(init) fori in range(1000): # load batch of images and correct answers batch_X, batch_Y = mnist.train.next_batch(100) train_data={X: batch_X, Y_: batch_Y} # train sess.run(train_step, feed_dict=train_data) TensorFlow - run ! running a Tensorflow computation, feeding placeholders
  • 14.
    import tensorflow astf X = tf.placeholder(tf.float32, [None, 28, 28, 1]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) init = tf.initialize_all_variables() # model Y=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b) # placeholder for correct answers Y_ = tf.placeholder(tf.float32, [None, 10]) # loss function cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y)) TensorFlow - full python code optimizer = tf.train.GradientDescentOptimizer(0.005) train_step = optimizer.minimize(cross_entropy) sess = tf.Session() sess.run(init) for i in range(10000): # load batch of images and correct answers batch_X, batch_Y = mnist.train.next_batch(100) train_data={X: batch_X, Y_: batch_Y} # train sess.run(train_step, feed_dict=train_data) initialisation model success metrics training step Run
  • 15.
  • 16.
  • 17.
    Let’s try 5fully-connected layers ! overkill ;-) 9...0 1 2 REctified Linear Unit softmax 200 100 60 10 30 784
  • 18.
    TensorFlow - initialisation K= 200 L = 100 M = 60 N = 30 W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1)) B1 = tf.Variable(tf.zeros([K])) W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1)) B2 = tf.Variable(tf.zeros([L])) W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1)) B3 = tf.Variable(tf.zeros([M])) W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1)) B4 = tf.Variable(tf.zeros([N])) W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1)) B5 = tf.Variable(tf.zeros([10])) weights initialised with random values
  • 19.
    TensorFlow - themodel X = tf.reshape(X, [-1, 28*28]) Y1 = tf.nn.relu(tf.matmul(X, W1) + B1) Y2 = tf.nn.relu(tf.matmul(Y1, W2) + B2) Y3 = tf.nn.relu(tf.matmul(Y2, W3) + B3) Y4 = tf.nn.relu(tf.matmul(Y3, W4) + B4) Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5) weights and biases
  • 20.
  • 21.
  • 22.
    Slow down .. . Learning rate decay
  • 23.
  • 24.
  • 25.
    Dropout TRAINING pkeep=0.75 EVALUATION pkeep=1 pkeep = tf.placeholder(tf.float32) Yf= tf.nn.relu(tf.matmul(X, W) + B) Y = tf.nn.dropout(Yf, pkeep)
  • 26.
  • 27.
    All the partytricks Learning rate 0.005Decaying learning rate 0.005 -> 0.0001Decaying learning rate 0.005 -> 0.0001 and dropout 0.75
  • 28.
  • 29.
    W1 [4, 4, 3] W2 [4,4, 3] +padding W[4, 4, 3, 2] filter size input channels output channels stride convolutional subsampling convolutional subsampling convolutional subsampling Convolutional layer
  • 30.
  • 31.
    Convolutional neural network convolutionallayer, 4 channels W1[5, 5, 1, 4] stride 1 convolutional layer, 8 channels W2[4, 4, 4, 8] stride 2 convolutional layer, 12 channels W3[4, 4, 8, 12] stride 2 28x28x1 28x28x4 14x14x8 200 7x7x12 10 fully connected layer W4[7x7x12, 200] softmax readout layer W5[200, 10] + biases on all layers
  • 32.
    Tensorflow - initialisation W1= tf.Variable(tf.truncated_normal([5, 5, 1, 4] ,stddev=0.1)) B1 = tf.Variable(tf.ones([4])/10) W2 = tf.Variable(tf.truncated_normal([5, 5, 4, 8] ,stddev=0.1)) B2 = tf.Variable(tf.ones([8])/10) W3 = tf.Variable(tf.truncated_normal([4, 4, 8, 12] ,stddev=0.1)) B3 = tf.Variable(tf.ones([12])/10) W4 = tf.Variable(tf.truncated_normal([7*7*12, 200] ,stddev=0.1)) B4 = tf.Variable(tf.ones([200])/10) W5 = tf.Variable(tf.truncated_normal([200, 10] ,stddev=0.1)) B5 = tf.Variable(tf.zeros([10])/10) filter size input channels output channels weights initialised with random values
  • 33.
    Tensorflow - themodel Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1) Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2) Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3) YY = tf.reshape(Y3, shape=[-1, 7 * 7 * 12]) Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4) Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5) weights biasesstride flatten all values for fully connected layer input image batch X[100, 28, 28, 1] Y3 [100, 7, 7, 12] YY [100, 7x7x12]
  • 34.
  • 35.
  • 36.
    Bigger convolutional network+ dropout convolutional layer, 12 channels W2[5, 5, 6, 12] stride 2 convolutional layer, 12 channels W2[5, 5, 6, 12] stride 2 convolutional layer, 6 channels W1[6, 6, 1, 6] stride 1 convolutional layer, 24 channels W3[4, 4, 12, 24] stride 2 convolutional layer, 24 channels W3[4, 4, 12, 24] stride 2 convolutional layer, 6 channels W1[6, 6, 1, 6] stride 1 28x28x1 28x28x6 14x14x12 200 7x7x24 10 fully connected layer W4[7x7x24, 200] softmax readout layer W5[200, 10] + biases on all layers +DROPOUT p=0.75
  • 37.
  • 38.
  • 39.
    Martin Görner Google Developerrelations @martin_gorner plus.google.com/+MartinGorner goo.gl/pHeXe7 youtu.be/qyvlt7kiQoI goo.gl/mVZloU github.com/martin-gorner/ tensorflow-mnist-tutorial goo.gl/UuN41S youtu.be/vq2nnJ4g6N0 github.com/martin-gorner/ tensorflow-rnn-shakespeare Where to go next Cartoon images copyright: alexpokusay / 123RF stock photos
  • 40.
    TensorFlow : MLfor Everyone It scales from research to production It supports many platforms Internal launch Search, Gmail, Translate, Maps, Android, Photos, Speech, YouTube, Play and many others + 100s of research projects and papers iOS Android TPU GPU CPU Compute Engine Cloud Machine Learning Engine Cloud Vision API Cloud Speech API Natural Language API Google Translate API Video Intelligence API Cloud Jobs API PRIVATE BETA PRIVATE ALPHA
  • 41.