Deep Learning
for Computer Vision
Executive-ML 2017/09/21
Neither Proprietary nor Confidential – Please Distribute ;)
Alex Conway
alex @ numberboost.com
@alxcnwy
Hands up!
Check out the
Deep Learning Indaba
videos & practicals!
http://www.deeplearningindaba.com/videos.html
http://www.deeplearningindaba.com/practicals.html
Deep Learning is Sexy (for a reason!)
4
Image Classification
5
http://yann.lecun.com/exdb/mnist/
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
(99.25% test accuracy in 192 seconds and 70 lines of code)
Image Classification
6
Image Classification
7
https://research.googleblog.com/2017/06/supercharge-
your-computer-vision-models.html
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
Image Captioning & Visual Attention
XXX
10
https://einstein.ai/research/knowing-when-to-look-adaptive-attention-
via-a-visual-sentinel-for-image-captioning
Image Q&A
11
https://arxiv.org/pdf/1612.00837.pdf
Video Q&A
XXX
12
https://www.youtube.com/watch?v=UeheTiBJ0Io
Pix2Pix
https://affinelayer.com/pix2pix/
https://github.com/affinelayer/pix2pix-tensorflow
13
Pix2Pix
https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-
mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66
14
15
Original input
Rear Window (1954)
Pix2pix output
Fully Automated
Remastered
Painstakingly by Hand
https://hackernoon.com
/remastering-classic-
films-in-tensorflow-with-
pix2pix-f4d551fa0503
Style Transfer
https://github.com/junyanz/CycleGAN 16
Style Transfer MAGIC
https://github.com/junyanz/CycleGAN
17
1. What is a neural network?
2. What is a convolutionalneural network?
3. How to use a convolutionalneural network
4. More advanced Methods
5. Case studies & applications
18
Big Shout Outs
Jeremy Howard & Rachel Thomas
http://course.fast.ai
Andrej Karpathy
http://cs231n.github.io
François Chollet (Keras lead dev)
https://keras.io/
19
1.What is a neural network?
What is a neuron?
21
• 3 inputs[x1,x2,x3]
• 3 weights[w1,w2,w3]
• Element-wise multiply and sum
• Apply activationfunction f
• Often add a bias too (weightof 1) – not shown
What is an Activation Function?
22
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transformneuron’s output
NB: sigmoid output in[0,1]
What is a (Deep) Neural Network?
23
Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Outputs of one layer are inputs into the next layer
How does a neural network learn?
24
• You need labelled examples “trainingdata”
• Initially, the network makes random predictions (weights initializedrandomly)
• For each training data point, we calculate the error between the network’s
predictions and the ground-truth labels (aka “loss function”)
• Use ‘backpropagation’ (really just the chain rule), to update the network
parameters (weights) in the opposite direction to the error
How does a neural network learn?
25
New
weight = Old
weight
Learning
rate-
Gradient of
weight with
respect to Error( )x
“How much
error increases
when we increase
this weight”
Gradient Descent Interpretation
26
http://scs.ryerson.ca/~aharley/neural-networks/
http://playground.tensorflow.org
What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep
Learning free online book
http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Notes
http://neuralnetworksanddeeplearning.com/chap1.html
28
2. What is a convolutional
neural network?
What is a Convolutional Neural Network?
30
“like a ordinary neural network but with special
types of layers that work well on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity ∈[0,255]
• Image has width w and height h
• Therefore image is w x h x 3 numbers
31
This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture
32
This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture
Convolutions
33
http://setosa.io/ev/image-kernels/
Convolutions
34
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
New Layer Type: ConvolutionalLayer
35
• 2-d weighted average when multiply kernel over pixel patches
• We slide the kernel over all pixels of the image (handle borders)
• Kernel starts off with “random” values and network updates (learns)
the kernel values (using backpropagation) to try minimize loss
• Kernels shared across the whole image (parameter sharing)
Many Kernels = Many “Activation Maps” = Volume
36http://cs231n.github.io/convolutional-networks/
New Layer Type: ConvolutionalLayer
37
Convolutions
38
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
Convolutions
39
Convolutions
40
Convolutions
41
Convolution Learn Heirarchial Features
42
Great vid
43
https://www.youtube.com/watch?v=AgkfIQ4IGaM
New Layer Type: Max Pooling
44
New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next
• …by replacing NxN sub-area with max value
• Makes network “look” at larger areas of the image at a time
• e.g. Instead of identifying fur, identify cat
• Reduces overfittingsince losing information helps the network generalize
45
New Layer Type: Max Pooling
46
Softmax
• Convert scores ∈ ℝ to probabilities∈ [0,1]
• Then predict the class with highest probability
47
Bringing it all together
48
Convolution+ max pooling+ fully connected+ softmax
Bringing it all together
49
Convolution+ max pooling+ fully connected+ softmax
Bringing it all together
50
Convolution+ max pooling+ fully connected+ softmax
Bringing it all together
51
Convolution+ max pooling+ fully connected+ softmax
Bringing it all together
52
Convolution+ max pooling+ fully connected+ softmax
Bringing it all together
53
Convolution+ max pooling+ fully connected+ softmax
We need labelled training data!
ImageNet
55
http://image-net.org/explore
1000 object categories
1.2 million training images
ImageNet
56
ImageNet
57
ImageNet Top 5 Error Rate
58
Traditional
Image Processing
Methods
AlexNet
8 Layers
ZFNet
8 Layers
GoogLeNet
22 Layers ResNet
152 Layers SENet
Ensamble
TSNet
Ensamble
3. How to use a
convolutional neural
network
Using a Pre-Trained ImageNet-Winning CNN
60
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• ImageNet 2014 Runner-up
• Network is 16 layers (deep!)
• Easy to fine-tune
https://blog.keras.io/building-powerful-image-classification-models-using-
very-little-data.html
Example: Classifying Product Images
61
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
Classifying
products into
9 categories
62
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Start with Pre-Trained ImageNet Model
“Transfer Learning”
is a game changer
Fine-tuning A CNN To Solve A New Problem
• Cut off last layer of pre-trained Imagenet winning CNN
• Keep learned network (convolutions) but replace final layer
• Can learn to predict new (completely different) classes
• Fine-tuning is re-training new final layer - learn for new task
64
Fine-tuning A CNN To Solve A New Problem
65
66
Before Fine-Tuning
67
After Fine-Tuning
Fine-tuning A CNN To Solve A New Problem
• Fix weights in convolutional layers (set trainable=False)
• Remove final dense layer that predicts 1000 ImageNet classes
• Replace with new dense layer to predict 9 categories
68
88% accuracy in under 2 minutes for
classifying products into categories
Fine-tuning is awesome!
Insert obligatory brain analogy
Visual Similarity
69
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these activations
https://memeburn.com/2017/06/spree-image-search/
70
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
71
Input Image
not seen by model
Results
Top 10 most
“visually similar”
4. More Advanced Methods
Use a Better Architecture (or all of them!)
73
“Ensambles win”
learn a weighted average of many models’ predictions
cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
There are MANY Computer Vision Tasks
> Long,Shelhamer,and Darrell,“Fully Convolutional Networks for SemanticSegmentation”CVPR2015
> Noh et al, “LearningDeconvolution Networkfor SemanticSegmentation”CVPR2015
Semantic Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
http://blog.romanofoti.com/style_transfer/
https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Style Transfer
Loss = content_loss + style_loss
Content loss = convolutions from pre-trained network
Style loss = gram matrix from style image convolutions
Video Q&A
XXX
80
https://www.youtube.com/watch?v=UeheTiBJ0Io
Video Q&A
XXX
81
https://www.youtube.com/watch?v=UeheTiBJ0Io
5. Case Studies
Image & Video Moderation
TODO
83
Large internationalgay datingapp with tens of millions of users
uploadinghundreds-of-thousandsofphotosperday
Estimating Accident Repair Cost from Photos
TODO
84
Prototype for
large SA insurer
Detect car make
& model from
registrationdisk
Predict repair
cost using
learnedmodel
Optical Sorting
TODO
85
https://www.youtube.com/watch?v=Xf7jaxwnyso
Segmenting Medical Images
TODO
86
m
Counting People
TODO
Countshoppers,segment on age & gender
facial recognition loyalty is next
Counting Cars
TODO
Wine A.I.
Detecting Potholes
GET IN TOUCH!
Alex Conway
alex @ numberboost.com
@alxcnwy
http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-
winners-interview-2nd-place-felix-lau/
Computer Vision Pipelines
https://flyyufelix.github.io/2017/04/16/kaggle-nature-
conservancy.html
https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/
Practical Tips
• use a GPU – AWS p2 instances(use spot!)
• when overfitting (validation_accuracy<<<training_accuracy)
– Increase dropout
– Early stopping
• when underfitting (low training_accuracy)
1. Add moredata
2. Use data augmentation
– Flipping/stretching / rotating
– Slightly change hues / brightness
3. Use more complex model
– Resnet / Inception /Densenet
– Ensamble (average<<< weightedaveragewith learnedweights)
94

Deep Learning for Computer Vision - ExecutiveML