PyConZA'17 Deep Learning for Computer Vision

Deep Learning
for Computer Vision
Executive-ML 2017/09/21
Neither Proprietary nor Confidential – Please Distribute ;)
Alex Conway
alex @ numberboost.com
@alxcnwy
PyConZA’17

Check out the
Deep Learning Indaba
videos & practicals!
http://www.deeplearningindaba.com/videos.html
http://www.deeplearningindaba.com/practicals.html

Deep Learning is Sexy (for a reason!)
4

Image Classification
5
http://yann.lecun.com/exdb/mnist/
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
(99.25% test accuracy in 192 seconds and 70 lines of code)

Image Classification
7
ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et. Al.
Advances in Neural Information Processing Systems 25 (NIPS2012)

https://research.googleblog.com/2017/06/supercharg
e-your-computer-vision-models.html
Object detection

https://www.youtube.com/watch?v=VOC3huqHrss
Object detection

Image Captioning & Visual Attention
XXX
11
https://einstein.ai/research/knowing-when-to-look-adaptive-
attention-via-a-visual-sentinel-for-image-captioning

Image Q&A
12
https://arxiv.org/pdf/1612.00837.pdf

Video Q&A
XXX
13
https://www.youtube.com/watch?v=UeheTiBJ0Io

Pix2Pix
https://affinelayer.com/pix2pix/
https://github.com/affinelayer/pix2pix-tensorflow
14

Pix2Pix
https://medium.com/towards-data-science/face2face-a-pix2pix-demo-
that-mimics-the-facial-expression-of-the-german-chancellor-
b6771d65bf66 15

16
Original input
Rear Window (1954)
Pix2pix output
Fully Automated
Remastered
Painstakingly by Hand
https://hackernoon.co
m/remastering-classic-
films-in-tensorflow-
with-pix2pix-
f4d551fa0503

Style Transfer
https://github.com/junyanz/CycleGAN 17

Style Transfer SORCERY
https://github.com/junyanz/CycleGAN
18

Real Fake News
https://www.youtube.com/watch?v=MVBe6_o4cMI
19

Deep learning is MagicDeep learning is MagicDeep learning is EASY!

1. What is a neural network?
2. What is a convolutional neural network?
3. How to use a convolutional neural
network
4. More advanced Methods
5. Case studies & applications 21

What is a neuron?
24
• 3 inputs [x1,x2,x3]
• 3 weights [w1,w2,w3]
• Element-wise multiply and sum
• Apply activation function f
• Often add a bias too (weight of 1) – not

What is an Activation Function?
25
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transform neuron’s
output
NB: sigmoid output in [0,1]

What is a (Deep) Neural Network?
26
Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Outputs of one layer are inputs into the next layer

How does a neural network learn?
27
• We need labelled examples “training data”
• We initialize network weights randomly and initially get random predictions
• For each labelled training data point, we calculate the error between the
network’s predictions and the ground-truth labels
• Use ‘backpropagation’ (chain rule), to update the network parameters
(weights + convolutional filters ) in the opposite direction to the error

How does a neural network learn?
28
New
weight =
Old
weight
Learning
rate-
Gradient of
weight with
respect to Error( )x
“How much
error increases
when we increase
this weight”

Gradient Descent Interpretation
29
http://scs.ryerson.ca/~aharley/neural-networks/

http://playground.tensorflow.org

What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks &
Deep Learning free online book
http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Notes
http://neuralnetworksanddeeplearning.com/chap1.html
31

2. What is a
convolutional
neural network?

What is a Convolutional Neural
Network?
33
“like a ordinary neural network but with
special types of layers that work well
on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity ∈[0,255]
• Image has width w and height h
• Therefore image is w x h x 3 numbers

34
This is VGGNet – don’t panic, we’ll break it down piece
by piece
Example Architecture

35
This is VGGNet – don’t panic, we’ll break it down piece
by piece
Example Architecture

Convolutions
36
http://setosa.io/ev/image-kernels/

Convolutions
37
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.ht
ml

New Layer Type: Convolutional Layer
38
• 2-d weighted average when multiply kernel over pixel patches
• We slide the kernel over all pixels of the image (handle
borders)
• Kernel starts off with “random” values and network updates
(learns) the kernel values (using backpropagation) to try
minimize loss
• Kernels shared across the whole image (parameter

Many Kernels = Many “Activation Maps” = Volume
39http://cs231n.github.io/convolutional-networks/

New Layer Type: Convolutional Layer
40

Convolutions
41
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visuali
zation.py

Convolution Learn Hierarchical Features
45

New Layer Type: Max Pooling
47

• Reduces dimensionality from one layer to next
• …by replacing NxN sub-area with max value
• Makes network “look” at larger areas of the image at a time
• e.g. Instead of identifying fur, identify cat
• Reduces overfitting since losing information helps the network
generalize
48http://cs231n.github.io/convolutional-networks/

49

Stack Conv + Pooling Layers and Go Deep
50
Convolution + max pooling + fully connected +
softmax

51
Stack Conv + Pooling Layers and Go Deep
softmax

52
Stack These Layers and Go Deep
softmax

53
Stack These Layers and Go Deep
softmax

54
Flatten the Final “Bottleneck” layer
softmax
Flatten the final 7 x 7 x 512 max pooling layer
Add fully-connected dense layer on top

55
Bringing it all together
softmax

Softmax
Convert scores ∈ ℝ to probabilities ∈ [0,1]
Final output prediction = highest probability class
56

Bringing it all together
57
softmax

We need labelled training data!

ImageNet
59
http://image-net.org/explore
1000 object categories
1.2 million training images

ImageNet Top 5 Error Rate
62
Traditional
Image Processing
Methods
AlexNet
8 Layers
ZFNet
8 Layers
GoogLeNe
t
22 Layers
ResNet
152 Layers SENet
Ensamble
TSNet
Ensamble

3. How to use a
convolutional neural
network

Using a Pre-Trained ImageNet-Winning
CNN
64
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• ImageNet 2014 Runner-up
• Network is 16 layers (deep!)
• Easy to fine-tune
https://blog.keras.io/building-powerful-image-classification-
models-using-very-little-data.html

Example: Classifying Product Images
65
https://github.com/alexcnwy/CTDL_CNN_TALK_20
170620
Classifying
products
into 9
categories

66
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-
data.html
Start with Pre-Trained ImageNet
Model
Consumers vs Producers of Machine Lea

“Transfer Learning”
is a game changer

Fine-tuning A CNN To Solve A New Problem
• Cut off last layer of pre-trained Imagenet winning
CNN
• Keep learned network (convolutions) but replace
final layer
• Can learn to predict new (completely different)
classes
• Fine-tuning is re-training new final layer - learn for
new task
69

70

• Fix weights in convolutional layers (set
trainable=False)
• Remove final dense layer that predicts 1000
ImageNet classes
• Replace with new dense layer to predict 9
categories
73
88% accuracy in under 2 minutes for
classifying products into categories
Fine-tuning is awesome!
Insert obligatory brain analogy

Visual Similarity
75
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these
activations
https://memeburn.com/2017/06/spree-image-search/

76
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

77
Input Image
not seen by
model
Results
Top 10 most
“visually similar”

78
Final Convolutional Layer = Semantic
Vector
• The final convolutional
layer encodes
everything the network
needs to make
predictions
• The dense layer added
on top and the softmax
layer both have lower
dimensionality

Use a Better Architecture (or all of
them!)
80
“Ensambles win”
learn a weighted average of many models’ predictions

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
There are MANY Computer Vision Tasks

Long et al. “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015
Noh et al. Learning Deconvolution Network for Semantic Segmentation. IEEE on Computer Vision 2016
Semantic Segmentation

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection

http://blog.romanofoti.com/style_transfer/
Johnson et al. Perceptual losses for real-time style transfer and super-resolution. 2016
Style Transfer
f ( ) =,

https://www.youtube.com/watch?v=LhF_56SxrGk

Pixelated OriginalOutput
https://arstec
hnica.com/inf
ormation-
technology/2
017/02/googl
e-brain-
super-
resolution-
zoom-
enhance/

This image is
3.8 kb
Super-Resolution

https://github.com/tdeboissiere/BGG16CAM-keras
Visual Attention

Image Captioning
https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-
captioning
Karpathy & Li. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), pp. 664–676

Image Q&A
XXX
92
http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

Video Q&A
XXX
93

Video Q&A
XXX
94

king + woman – man ≈ queen
95
Frome et al. (2013) ‘DeViSE: A Deep Visual-Semantic Embedding Model’,
Advances in Neural Information Processing Systems, pp. 2121–2129
CNN + Word2Vec = AWESOME

DeViSE: A Deep Visual-Semantic Embedding Model
XXX
96
Before:
Encode labels as 1-hot vector

XXX
97
After:
Encode labels as word2vec vectors
(FROM A SEPARATE MODEL)
Can look these up for all the nouns in ImageNet
300-d
word2vec
vectors

wv(fish)+ wv(net)
98
https://www.youtube.com/watch?v=uv0gmrXSXVg
2
wv* =
…get nearest neighbours to wv*

Estimating Accident Repair Cost from
Photos
TODO
100
Prototype for
large SA
insurer
Detect car
make &
model from
registration
disk
Predict repair
cost using
learned

Image & Video Moderation
TODO
101
Large international gay dating app with tens of
millions of users
uploading hundreds-of-thousands of photos per day

Segmenting Medical Images
TODO
102

m
Counting People
TODO
Count shoppers, segment on age & gender
facial recognition loyalty is next

Extracting Data from Product
Catalogues

Real-time ATM Video Classification
107

Optical Sorting
TODO
108
https://www.youtube.com/watch?v=Xf7jaxwnyso

We are hiring!

GET IN TOUCH!
Alex Conway
@alxcnwy

PyConZA'17 Deep Learning for Computer Vision

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PyConZA'17 Deep Learning for Computer Vision

Similar to PyConZA'17 Deep Learning for Computer Vision (20)

Recently uploaded

Recently uploaded (20)

PyConZA'17 Deep Learning for Computer Vision

Editor's Notes