Deep Learning for Computer Vision - ExecutiveML

Deep Learning
for Computer Vision
Executive-ML 2017/09/21
Neither Proprietary nor Confidential – Please Distribute ;)
Alex Conway
alex @ numberboost.com
@alxcnwy

Check out the
Deep Learning Indaba
videos & practicals!
http://www.deeplearningindaba.com/videos.html
http://www.deeplearningindaba.com/practicals.html

Deep Learning is Sexy (for a reason!)
4

Image Classification
5
http://yann.lecun.com/exdb/mnist/
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
(99.25% test accuracy in 192 seconds and 70 lines of code)

https://research.googleblog.com/2017/06/supercharge-
your-computer-vision-models.html
Object detection

https://www.youtube.com/watch?v=VOC3huqHrss
Object detection

Image Captioning & Visual Attention
XXX
10
https://einstein.ai/research/knowing-when-to-look-adaptive-attention-
via-a-visual-sentinel-for-image-captioning

Image Q&A
11
https://arxiv.org/pdf/1612.00837.pdf

Video Q&A
XXX
12
https://www.youtube.com/watch?v=UeheTiBJ0Io

Pix2Pix
https://affinelayer.com/pix2pix/
https://github.com/affinelayer/pix2pix-tensorflow
13

Pix2Pix
https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-
mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66
14

15
Original input
Rear Window (1954)
Pix2pix output
Fully Automated
Remastered
Painstakingly by Hand
https://hackernoon.com
/remastering-classic-
films-in-tensorflow-with-
pix2pix-f4d551fa0503

Style Transfer
https://github.com/junyanz/CycleGAN 16

Style Transfer MAGIC
https://github.com/junyanz/CycleGAN
17

1. What is a neural network?
2. What is a convolutionalneural network?
3. How to use a convolutionalneural network
4. More advanced Methods
5. Case studies & applications
18

Big Shout Outs
Jeremy Howard & Rachel Thomas
http://course.fast.ai
Andrej Karpathy
http://cs231n.github.io
François Chollet (Keras lead dev)
https://keras.io/
19

What is a neuron?
21
• 3 inputs[x1,x2,x3]
• 3 weights[w1,w2,w3]
• Element-wise multiply and sum
• Apply activationfunction f
• Often add a bias too (weightof 1) – not shown

What is an Activation Function?
22
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transformneuron’s output
NB: sigmoid output in[0,1]

What is a (Deep) Neural Network?
23
Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Outputs of one layer are inputs into the next layer

How does a neural network learn?
24
• You need labelled examples “trainingdata”
• Initially, the network makes random predictions (weights initializedrandomly)
• For each training data point, we calculate the error between the network’s
predictions and the ground-truth labels (aka “loss function”)
• Use ‘backpropagation’ (really just the chain rule), to update the network
parameters (weights) in the opposite direction to the error

How does a neural network learn?
25
New
weight = Old
weight
Learning
rate-
Gradient of
weight with
respect to Error( )x
“How much
error increases
when we increase
this weight”

Gradient Descent Interpretation
26
http://scs.ryerson.ca/~aharley/neural-networks/

http://playground.tensorflow.org

What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep
Learning free online book
http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Notes
http://neuralnetworksanddeeplearning.com/chap1.html
28

2. What is a convolutional
neural network?

What is a Convolutional Neural Network?
30
“like a ordinary neural network but with special
types of layers that work well on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity ∈[0,255]
• Image has width w and height h
• Therefore image is w x h x 3 numbers

31
This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture

32
This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture

Convolutions
33
http://setosa.io/ev/image-kernels/

Convolutions
34
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

New Layer Type: ConvolutionalLayer
35
• 2-d weighted average when multiply kernel over pixel patches
• We slide the kernel over all pixels of the image (handle borders)
• Kernel starts off with “random” values and network updates (learns)
the kernel values (using backpropagation) to try minimize loss
• Kernels shared across the whole image (parameter sharing)

Many Kernels = Many “Activation Maps” = Volume
36http://cs231n.github.io/convolutional-networks/

New Layer Type: ConvolutionalLayer
37

Convolutions
38
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

Convolution Learn Heirarchial Features
42

Great vid
43
https://www.youtube.com/watch?v=AgkfIQ4IGaM

New Layer Type: Max Pooling
44

• Reduces dimensionality from one layer to next
• …by replacing NxN sub-area with max value
• Makes network “look” at larger areas of the image at a time
• e.g. Instead of identifying fur, identify cat
• Reduces overfittingsince losing information helps the network generalize
45

46

Softmax
• Convert scores ∈ ℝ to probabilities∈ [0,1]
• Then predict the class with highest probability
47

Bringing it all together
48
Convolution+ max pooling+ fully connected+ softmax

49

50

51

52

53

We need labelled training data!

ImageNet
55
http://image-net.org/explore
1000 object categories
1.2 million training images

ImageNet Top 5 Error Rate
58
Traditional
Image Processing
Methods
AlexNet
8 Layers
ZFNet
8 Layers
GoogLeNet
22 Layers ResNet
152 Layers SENet
Ensamble
TSNet
Ensamble

3. How to use a
convolutional neural
network

Using a Pre-Trained ImageNet-Winning CNN
60
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• ImageNet 2014 Runner-up
• Network is 16 layers (deep!)
• Easy to fine-tune
https://blog.keras.io/building-powerful-image-classification-models-using-
very-little-data.html

Example: Classifying Product Images
61
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
Classifying
products into
9 categories

62
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Start with Pre-Trained ImageNet Model

“Transfer Learning”
is a game changer

Fine-tuning A CNN To Solve A New Problem
• Cut off last layer of pre-trained Imagenet winning CNN
• Keep learned network (convolutions) but replace final layer
• Can learn to predict new (completely different) classes
• Fine-tuning is re-training new final layer - learn for new task
64

65

• Fix weights in convolutional layers (set trainable=False)
• Remove final dense layer that predicts 1000 ImageNet classes
• Replace with new dense layer to predict 9 categories
68
88% accuracy in under 2 minutes for
classifying products into categories
Fine-tuning is awesome!
Insert obligatory brain analogy

Visual Similarity
69
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these activations
https://memeburn.com/2017/06/spree-image-search/

70
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

71
Input Image
not seen by model
Results
Top 10 most
“visually similar”

Use a Better Architecture (or all of them!)
73
“Ensambles win”
learn a weighted average of many models’ predictions

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
There are MANY Computer Vision Tasks

> Long,Shelhamer,and Darrell,“Fully Convolutional Networks for SemanticSegmentation”CVPR2015
> Noh et al, “LearningDeconvolution Networkfor SemanticSegmentation”CVPR2015
Semantic Segmentation

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection

http://blog.romanofoti.com/style_transfer/
https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Style Transfer
Loss = content_loss + style_loss
Content loss = convolutions from pre-trained network
Style loss = gram matrix from style image convolutions

Video Q&A
XXX
80

Video Q&A
XXX
81

Image & Video Moderation
TODO
83
Large internationalgay datingapp with tens of millions of users
uploadinghundreds-of-thousandsofphotosperday

Estimating Accident Repair Cost from Photos
TODO
84
Prototype for
large SA insurer
Detect car make
& model from
registrationdisk
Predict repair
cost using
learnedmodel

Optical Sorting
TODO
85
https://www.youtube.com/watch?v=Xf7jaxwnyso

Segmenting Medical Images
TODO
86

m
Counting People
TODO
Countshoppers,segment on age & gender
facial recognition loyalty is next

GET IN TOUCH!
Alex Conway
alex @ numberboost.com
@alxcnwy

http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-
winners-interview-2nd-place-felix-lau/
Computer Vision Pipelines
https://flyyufelix.github.io/2017/04/16/kaggle-nature-
conservancy.html

https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/

Practical Tips
• use a GPU – AWS p2 instances(use spot!)
• when overfitting (validation_accuracy<<<training_accuracy)
– Increase dropout
– Early stopping
• when underfitting (low training_accuracy)
1. Add moredata
2. Use data augmentation
– Flipping/stretching / rotating
– Slightly change hues / brightness
3. Use more complex model
– Resnet / Inception /Densenet
– Ensamble (average<<< weightedaveragewith learnedweights)
94

Deep Learning for Computer Vision - ExecutiveML

More Related Content

What's hot

Similar to Deep Learning for Computer Vision - ExecutiveML

Recently uploaded

Deep Learning for Computer Vision - ExecutiveML