Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow and Deep Learning
without a PhD
Lucio Floretta
CODEMOTION MILAN - SPECIAL EDITION
10 – 11 NOVEMBER 2017

?
MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/
Hello World: handwritten digits classification - MNIST

Very simple model: softmax classification
28x28
pixels
softmax
...
...
0 1 2 9
Logit := weighted sum
of all pixels + bias
neuron outputs
784 pixels

+ b0
b1
b2
b3
… b9
L0,0
L0,1
L0,2
L0,3
… L0,9
L0,0
L1,0
L1,1
L1,2
L1,3
… L1,9
L2,0
L2,1
L2,2
L2,3
… L2,9
L3,0
L3,1
L3,2
L3,3
… L3,9
…
L99,0
L99,1
L99,2
… L99,9
x
x
x
x
x
x
x
x
w0,0
w0,1
w0,2
w0,3
… w0,9
w1,0
w1,1
w1,2
w1,3
… w1,9
w2,0
w2,1
w2,2
w2,3
… w2,9
w3,0
w3,1
w3,2
w3,3
… w3,9
w4,0
w4,1
w4,2
w4,3
… w4,9
w5,0
w5,1
w5,2
w5,3
… w5,9
w6,0
w6,1
w6,2
w6,3
… w6,9
w7,0
w7,1
w7,2
w7,3
… w7,9
w8,0
w8,1
w8,2
w8,3
… w8,9
…
w783,0
w783,1
w783,2
… w783,9
10 columns
784lines
broadcast
In matrix notation, 100 images at a time
784 pixels
X : 100 images,
one per line,
flattened x
x
+ Same 10 biases
on all lines

Softmax, on a batch of images
Predictions Images Weights Biases
Y[100, 10] X[100, 784] W[784,10] b[10]
matrix multiply
broadcast
on all lines
applied line
by line
tensor shapes in [ ]

Now in TensorFlow (Python)
Y = tf.nn.softmax(tf.matmul(X, W) + b)
tensor shapes: X[100, 784] W[748,10] b[10]
matrix multiply
broadcast
on all lines

Success?
Cross entropy:
Y := computed probabilities
Y_ := actual probabilities, “one-hot” encoded
0 0 0 0 0 0 1 0 0 0
0 1 2 3 4 5 6 7 8 9
0.02 0.01 0.01 0.11 0.02 0.01 0.78 0.01 0.01 0.01
0 1 2 3 4 5 6 7 8 9
this is a “6”

TensorFlow - initialisation
import tensorflow as tf
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
init = tf.initialize_all_variables()
this will become the batch size
28 x 28 grayscale images
Training = computing variables W and b

# model
Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
TensorFlow - success metrics
“one-hot” encoded
flattening images

TensorFlow - training
optimizer = tf.train.GradientDescentOptimizer(0.005)
train_step = optimizer.minimize(cross_entropy)
learning rate
loss function

sess = tf.Session()
sess.run(init)
for i in range(1000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}
# train
sess.run(train_step, feed_dict=train_data)
TensorFlow - run !
running a Tensorflow
computation, feeding
placeholders

import tensorflow as tf
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
init = tf.initialize_all_variables()
# model
Y=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
TensorFlow - full python code
optimizer = tf.train.GradientDescentOptimizer(0.005)
train_step = optimizer.minimize(cross_entropy)
sess = tf.Session()
sess.run(init)
for i in range(10000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}
# train
sess.run(train_step, feed_dict=train_data)
initialisation
model
success metrics
training step
Run

Let’s try 5 fully-connected layers !
overkill
;-)
9...0 1 2
REctified
Linear
Unit
softmax
200
100
60
10
30
784

TensorFlow - initialisation
K = 200
L = 100
M = 60
N = 30
W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))
B1 = tf.Variable(tf.zeros([K]))
W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))
B2 = tf.Variable(tf.zeros([L]))
W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))
B3 = tf.Variable(tf.zeros([M]))
W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))
B4 = tf.Variable(tf.zeros([N]))
W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))
B5 = tf.Variable(tf.zeros([10]))
weights initialised
with random values

TensorFlow - the model
X = tf.reshape(X, [-1, 28*28])
Y1 = tf.nn.relu(tf.matmul(X, W1) + B1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + B2)
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)
weights and biases

Slow down . . .
Learning
rate decay

Dropout
TRAINING
pkeep=0.75
EVALUATION
pkeep=1
pkeep = tf.placeholder(tf.float32)
Yf = tf.nn.relu(tf.matmul(X, W) + B)
Y = tf.nn.dropout(Yf, pkeep)

All the party tricks
Learning rate 0.005Decaying learning rate 0.005 -> 0.0001Decaying learning rate 0.005 -> 0.0001 and dropout 0.75

Too many
neurons
Overfitting
?!?
Not
enough
DATA
BAD
Network

W1
[4, 4, 3]
W2
[4, 4, 3]
+padding
W[4, 4, 3, 2]
filter
size
input
channels
output
channels
stride
convolutional
subsampling
convolutional
subsampling
convolutional
subsampling
Convolutional layer

Hacker’s tip
ALL
Convolu-
tional

Convolutional neural network
convolutional layer, 4 channels
W1[5, 5, 1, 4] stride 1
W2[4, 4, 4, 8] stride 2
W3[4, 4, 8, 12] stride 2
28x28x1
28x28x4
14x14x8
200
7x7x12
10
fully connected layer W4[7x7x12, 200]
softmax readout layer W5[200, 10]
+ biases on
all layers

Tensorflow - initialisation
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 4] ,stddev=0.1))
B1 = tf.Variable(tf.ones([4])/10)
W4 = tf.Variable(tf.truncated_normal([7*7*12, 200] ,stddev=0.1))
W5 = tf.Variable(tf.truncated_normal([200, 10] ,stddev=0.1))
B5 = tf.Variable(tf.zeros([10])/10)
filter
size
input
channels
output
channels
weights initialised
with random values

Tensorflow - the model
Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)
Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)
Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3)
YY = tf.reshape(Y3, shape=[-1, 7 * 7 * 12])
Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4)
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)
weights biasesstride
flatten all values for
fully connected layer
input image batch
X[100, 28, 28, 1]
Y3 [100, 7, 7, 12]
YY [100, 7x7x12]

Bigger convolutional network + dropout
W2[5, 5, 6, 12] stride 2
W2[5, 5, 6, 12] stride 2
W1[6, 6, 1, 6] stride 1
W3[4, 4, 12, 24] stride 2
W3[4, 4, 12, 24] stride 2
W1[6, 6, 1, 6] stride 1
28x28x1
28x28x6
14x14x12
200
7x7x24
10
fully connected layer W4[7x7x24, 200]
softmax readout layer W5[200, 10]
+ biases on
all layers
+DROPOUT
p=0.75

Martin Görner
Google Developer relations
@martin_gorner
plus.google.com/+MartinGorner
goo.gl/pHeXe7
youtu.be/qyvlt7kiQoI
goo.gl/mVZloU
github.com/martin-gorner/
tensorflow-mnist-tutorial
goo.gl/UuN41S
youtu.be/vq2nnJ4g6N0
github.com/martin-gorner/
tensorflow-rnn-shakespeare
Where to go next
Cartoon images copyright:
alexpokusay / 123RF stock photos

TensorFlow : ML for Everyone
It scales from research to production It supports many platforms
Internal launch
Search, Gmail, Translate, Maps, Android,
Photos, Speech, YouTube, Play and many others
+
100s of research projects and papers
iOS
Android
TPU
GPU
CPU Compute Engine
Cloud Machine
Learning Engine
Cloud Vision API
Cloud Speech API
Natural Language API
Google Translate API
Video Intelligence API
Cloud Jobs API
PRIVATE BETA
PRIVATE ALPHA

Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

More Related Content

What's hot

Similar to Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

More from Codemotion

Recently uploaded

Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017