Deep Learning for Folks Without (or With!) a Ph.D.

Polyglot Ninja
Memphis, TN area
Co-Directory of MemPy
Data Science, Machine Learning, Mobile Apps
Wannabe Game Designer
Pluralsight Author
@poweredbyaltnet
Douglas Starnes
@poweredbyaltnet

Ground Rules
• I’m going to throw a lot of information at you
• We have 4 hours together (that’s 240 minutes)
• There are 3 breaks for a total of 30 minutes
• There are 89 slides (~ 45 min)
• There are 10 exercises (~17 minutes each)
• There will be questions
• There will be problems logging in
• Google could go on vacation
• But the wifi will rock!
• The agenda is a guide.
• I’m here to focus on what you are interested in.
• If I ask a question, it’s OK to answer.
• Like .. out loud
• I honestly welcome your feedback.
• I’ve made changes to this workshop based on comments (both good and bad) from
my last workshop two weeks ago. @poweredbyaltnet

“AGENDA”
(verrrrrrrry tentative, take with a grain of salt, or 10-20 grains)
• Intro (5 mins) – SLIDES
• Jupyter notebook / Google Colab (10 mins) – SLIDES
• Python primer (20 mins) – Hands on
• PyData (10 mins) – hands on
• Break (10 mins)
• Neural Network Concepts (30 mins) – slides
• Neural Network Classification (15 mins) – hands on
• Neural Network Regression (15 mins) – hands on
• Break (10 mins)
• Convolutional Neural Network Concepts (20 mins) - slides
• Convolutional Neural Network MNIST (20 mins) – hands on
• Convolutional Neural Network CIFAR-10 (15 mins) – hands on
• Break (10 mins)
• RNN Concepts (10 mins) – slides
• RNN IMDB (10 mins) – hands on
• Tensorboard – and other cool stuff? (25 mins) – hands on
@poweredbyaltnet

MEET GOOGLE
COLAB
http://colab.research.google.com
@poweredbyaltnet

This is a notebook
@poweredbyaltnet

Python code goes into cells
@poweredbyaltnet

Shift-Enter executes a cell
@poweredbyaltnet

It’s like a Python session in a webpage
@poweredbyaltnet

Actually all the work is done in Google Cloud
@poweredbyaltnet

Syntax highlighting and code completion
@poweredbyaltnet

Shell commands
@poweredbyaltnet

Rich text using Markdown
@poweredbyaltnet

Everything is stored in Google Drive
@poweredbyaltnet

You even get access to GPUs and TPUs!
@poweredbyaltnet

You can get all of this
and more for only
$19.99!
@poweredbyaltnet

Actually it’s free
@poweredbyaltnet

The only price of
admission is to login
with a GMail address
@poweredbyaltnet

So let’s get started
Welcome to
Deep Learning for Folks Without a Ph.D.^
@poweredbyaltnet

http://bit.ly/dl-woaphd
@poweredbyaltnet

@poweredbyaltnet
Artificial Intelligence
Machine Learning
Deep Learning
D
a
t
a
S
c
i
e
n
c
e

Classification
• Cats
• have 4 legs
• have fur
• have long tails
• have short noses
• have small upright ears
• Dogs
• have 4 legs
• have fur
• have long tails
• have long noses
• have large floppy ears
• Pigs
• have 4 legs
• have no/little fur
• have short tails
• have short noses
• have large upright ears
@poweredbyaltnet

Dog
• Four legs
• Fur
• Long nose
@poweredbyaltnet

Pig
• Four legs
• Little fur
• Large upright ears
@poweredbyaltnet

Cat
• Four legs
• Fur
• Small upright ears
@poweredbyaltnet

Dog
• Four legs
• Furry
• Long nose
@poweredbyaltnet

@poweredbyaltnet
Neural networks are loosely inspired by how neuroscientists today think the brain might possibly work.
(assuming they are on the right track)

@poweredbyaltnet
node
layer
input
output
hidden

@poweredbyaltnet
A neural network with more than two hidden layers is called a deep neural network.

@poweredbyaltnet
Sepal Length Sepal Width Petal Length Petal Width Species
5.1 3.5 1.4 0.2 setosa
… … … … …
7.0 3.2 4.7 1.4 versicolor
… … … … …
5.7 2.8 4.1 1.3 virginica
rows -> observations
columns -> features
feature vector / independent variables -> observed values
target / dependent variable(s) -> what we want to predict
feature vector target
train a model that, given a feature vector, will (hopefully accurately) assign a target
train a model that, given values for Sepal Length, Sepal Width, Petal Length, Petal Width, will
assign one value from the set of setosa, versicolor, or virginica

@poweredbyaltnet
input
5.1
3.5
1.4
0.2
outputhidden
.55
.30
.15
55%
setosa
30%
versicolor
15%
virginica

@poweredbyaltnet
Is this
prediction
correct?

@poweredbyaltnet
There is only one trick question in
the entire talk. This is it.

@poweredbyaltnet
During training, we are not
concerned with making a correct
prediction, but rather making a
prediction that is as close to
correct as possible.

@poweredbyaltnet
Prediction
.55
.30
.15
Actual
1.0
0
0
setosa
versicolor
virginica

@poweredbyaltnet
I think we
can do
better!

@poweredbyaltnet
Prediction
.55
.30
.15
Actual
1.0
0
0
.9999999
.0000001
.0000001
More 9’s please!

@poweredbyaltnet
Prediction Actual
Cost function
Model Dataset
Optimizer

@poweredbyaltnet
learning rate
If the learning rate is too large, we will skip over the minimum.
If the learning rate is too small, we will waste time.

@poweredbyaltnet
Prediction Actual
Cost function
Model Dataset
Optimizerbackpropagation

@poweredbyaltnet
The model is trained
Now what?

@poweredbyaltnet
DatasetDataset
Training Dataset
Test Dataset
70-80%
20-30%
Model
feature vector
Setosa Versicolor Virginica
.99 .002 .008
.98 .015 .005
.002 .99 .008
.05 .04 .991
Setosa Versicolor Virginica
1.0 0 0
1.0 0 0
1.0 0 0
0 0 1.0
t
a
r
g
e
t
prediction
𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

@poweredbyaltnet
playground.tensorflow.org

@poweredbyaltnet
Meet MNIST
Scans of handwritten digits (0-9)
Each image is 28x28 pixels
Feature vector is the entire
image (784 pixels)
The target is the digit
represented (0-9)
Each pixel value is 0 (black)
to 255 (white)
Interesting variations
(ie. fashion MNIST)
Each pixel (for now) is a
feature

@poweredbyaltnet
7
8
4
input
(implied)
1
2
8
hidden
r
e
l
u

Why Activation Layers
• Without activation layers, we can only approximate linear functions
(e.g. linear regression)
• Neural networks are universal function approximators
• Activation layers introduce non-linearity
• Non-linear functions are more complex
• The real world is often very complex

Linear
𝑓 𝑥 = 𝑥
@poweredbyaltnet

ReLU (Rectified Linear Unit)
𝑓 𝑥 =
0, 𝑥 < 0
𝑥, 𝑥 ≥ 0
@poweredbyaltnet

Sigmoid
𝑓 𝑥 =
1
1 + 𝑒−𝑥
@poweredbyaltnet

Hyperbolic Tangent
𝑓 𝑥 = tanh(𝑥)
@poweredbyaltnet

@poweredbyaltnet
7
8
4
input
(implied)
1
2
8
hidden
r
e
l
u
1
0
output
s
o
f
t
m
a
x

@poweredbyaltnet
from math import exp
i = [100, 101, 102]
def softmax(x):
return exp(i[x]) / sum([exp(j) for j in i])
print([softmax(a) for a in range(len(i))])
[0.09003057317038046, 0.24472847105479764, 0.6652409557748219]
9% 24% 67%
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑖 𝑥 =
𝑒 𝑖 𝑥
𝑗=1
𝑘
𝑒 𝑖 𝑗
𝑓𝑎𝑘𝑒_𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑖 𝑥 =
𝑖 𝑥
𝑗=1
𝑘
𝑖𝑗
Exacerbate
the error
[0.33003300330033003, 0.3333333333333333, 0.33663366336633666]
33% 33% 34%

@poweredbyaltnet
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784).astype(‘float32’) / 255 # also apply to X_test
Y_train = np_utils.to_categorical(y_train, 10) # one hot encoding, also apply to y_test
model = Sequential()
model.add(Dense(128, input_shape=(784,))) # hidden layer, input implied
model.add(Activation(‘relu’))
model.compile(loss=‘categorical_crossentropy’, optimizer=SGD(), metrics=[‘accuracy’])
model.fit(X_train, Y_train, batch_size=128, epochs=20) # around 94%
score = model.evaluate(X_test, Y_test)
model.add(Dense(10)) # output layer
model.add(Activation(‘softmax’))

@poweredbyaltnet
That’s all there is to
it?
Surely there is a
catch.

@poweredbyaltnet
28px
28px
1
0 7,840
101,77028px
28px
1
2
8
1
0
480px
640px
>300,000

@poweredbyaltnet
Goal: consider fewer, more relevant, features
How to determine which features are relevant?
Let the network decide!
Convolutional Neural Network (CNN)
Three new layers
Convolutional layers – detect features
Pooling layers – summarize features
Features are then used in dense layers
Dropout – prevent overfitting

@poweredbyaltnet
i
n
p
u
t
C
o
n
v
2
D

@poweredbyaltnet
1 0 0
1 1 1
0 1 1
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
filter/kernel

@poweredbyaltnet
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
1 0 0
1 1 1
0 1 1
6

@poweredbyaltnet
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
1 0 0
1 1 1
0 1 1
6 4

@poweredbyaltnet
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
1 0 0
1 1 1
0 1 1
6 4 3

@poweredbyaltnet
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
1 0 0
1 1 1
0 1 1
6 4 3 4

@poweredbyaltnet
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
1 0 0
1 1 1
0 1 1
6 4 3 4
5

@poweredbyaltnet
1 0 0 1
1 1 1 0
0 1 1 1
1 1 1 0
1 0
1 1
0 1
1 1
0 0 1 1
0 1 1 0
0 1
0 0
1 0 0
1 1 1
0 1 1
6 4 3 4
5 5 4 4
4 5 4 4
4 4 3 2
convolution

@poweredbyaltnet
i
n
p
u
t
C
o
n
v
2
D
C
o
n
v
2
D
M
a
x
P
o
o
l
i
n
g
2
D
r
e
l
u

@poweredbyaltnet
6 4 3 4
5 5 4 4
4 5 4 4
4 4 3 2
6

@poweredbyaltnet
6 4 3 4
5 5 4 4
4 5 4 4
4 4 3 2
6 4

@poweredbyaltnet
6 4 3 4
5 5 4 4
4 5 4 4
4 4 3 2
6 4
5

@poweredbyaltnet
6 4 3 4
5 5 4 4
4 5 4 4
4 4 3 2
6 4
5 4

@poweredbyaltnet
i
n
p
u
t
C
o
n
v
2
D
C
o
n
v
2
D
M
a
x
P
o
o
l
i
n
g
2
D
r
e
l
u
D
r
o
p
o
u
t

@poweredbyaltnet
... ... ... ... ... ...

@poweredbyaltnet
i
n
p
u
t
C
o
n
v
2
D
C
o
n
v
2
D
M
a
x
P
o
o
l
i
n
g
2
D
r
e
l
u
D
r
o
p
o
u
t
F
l
a
t
t
e
n
s
o
f
t
m
a
x
D
e
n
s
e
r
e
l
u
D
r
o
p
o
u
t

@poweredbyaltnet
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Flatten, Conv2D, MaxPooling2D
(X_train, y_train), (X_valid, y_valid) = mnist.load_data()
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

and _____?
butter and _________bread / garlic / herbs
peanut butter and _________jelly / bananas
@poweredbyaltnet

Recurrent Neural Network Cell
@poweredbyaltnet

t
tanh
t-1 t+1
xt
ht
@poweredbyaltnet

the quick brown fox jumped over the lazy dog
x1 x2 x3 x4 x5 x6 x7 x8 x9
h1 h2 h3 h4 h5 h6 h7 h8 h9
@poweredbyaltnet

from keras.layers import SimpleRNN
...
model.add(SimpleRNN(64))
@poweredbyaltnet

Embedding Layers
• Map words into vectors
• Used as input to the RNN
• model = Sequential()
model.add(Embedding(2000, 16, input_length=200))
model.add(SimpleRNN(50))
model.add(Dense(1, activation=‘sigmoid’))
model.compile(loss=‘binary_crossentropy’, ...)
@poweredbyaltnet

Long Short Term Memory Cell
@poweredbyaltnet

𝜎
x
x +
tanh𝜎
x
𝜎
tanh
xt ht
@poweredbyaltnet

from keras.layers import LSTM
...
model.add(LSTM(64))
@poweredbyaltnet

THANK YOU!
douglas@douglasstarnes.com
@poweredbyaltnet
http://douglasstarnes.com
(returning soon)

Deep Learning for Folks Without (or With!) a Ph.D.

Recommended

Recommended

More Related Content

Similar to Deep Learning for Folks Without (or With!) a Ph.D.

Similar to Deep Learning for Folks Without (or With!) a Ph.D. (20)

Recently uploaded

Recently uploaded (20)

Deep Learning for Folks Without (or With!) a Ph.D.