Convolutional
Neural Networks
Alex Conway
alex @ numberboost.com
& JHB Deep Learning Hackathon Info :)
For Computer Vision
Applications
Hands up!
Check out the
Deep Learning Indaba
videos & practicals!
http://www.deeplearningindaba.com/videos.html
http://www.deeplearningindaba.com/practicals.html
Image Classification
4
http://yann.lecun.com/exdb/mnist/
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
(99.25% test accuracy in 192 seconds and 70 lines of code)
Image Classification
5
Image Classification
6
https://research.googleblog.com/2017/06/supercharg
e-your-computer-vision-models.html
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
Image Captioning & Visual Attention
XXX
9
https://einstein.ai/research/knowing-when-to-look-adaptive-
attention-via-a-visual-sentinel-for-image-captioning
Image Q&A
10
https://arxiv.org/pdf/1612.00837.pdf
Video Q&A
XXX
11
https://www.youtube.com/watch?v=UeheTiBJ0Io
Pix2Pix
https://affinelayer.com/pix2pix/
https://github.com/affinelayer/pix2pix-
tensorflow 12
Pix2Pix
https://medium.com/towards-data-
science/face2face-a-pix2pix-demo-that-mimics-
the-facial-expression-of-the-german-chancellor-
b6771d65bf66 13
14
Original input
Pix2pix output
Remastered
https://hackernoon.co
m/remastering-classic-
films-in-tensorflow-
with-pix2pix-
f4d551fa0503
Style Transfer
https://github.com/junyanz/CycleGAN
15
Style Transfer
https://github.com/junyanz/CycleGAN
16
1. What is a neural network?
2. What is a convolutional neural
network?
3. How to use a convolutional neural
network
4. More advanced methods
5. Practical tips
6. Hackathon Challenge Info 17
Big Shout Outs
Jeremy Howard & Rachel Thomas
http://course.fast.ai
Andrej Karpathy
http://cs231n.github.io
François Chollet (Keras lead dev)
https://keras.io/
18
1.What is a neural
network?
What is a single neuron?
20
• 3 inputs [x1,x2,x3]
• 3 weights [w1,w2,w3]
• Element-wise multiply and sum
• Apply activation function f
• Often add a bias too (weight of 1) – not
What is an Activation Function?
21
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transform neuron’s
output
NB: sigmoid output in [0,1]
What is a Deep Neural Network?
22
Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Outputs of one layer are inputs into the next layer
How does a neural network learn?
23
• You need labelled examples “training data”
• Initially, the network makes random predictions (weights initialized
randomly)
• For each training data point, we calculate the error between the
network’s predictions and the ground-truth labels (“loss function”
e.g. mean squared error)
• Using a method called ‘backpropagation’ (really just the chain rule
from calculus 1), we use the error to update the weights (using
‘gradient descent’) so that the error is a little bit smaller next time (it
learns from past errors)
http://playground.tensorflow.org
What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks &
Deep Learning free online book
http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Notes
http://neuralnetworksanddeeplearning.com/chap1.html
25
2. What is a
convolutional
neural network?
What is a Convolutional Neural
Network?
27
“like a simple neural network but with
special types of layers that work well on
images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity = number in [0,255]
• Image has width w and height h
• Therefore image is w x h x 3 numbers
28
This is VGGNet – don’t panic, we’ll break it down piece
by piece
Example Architecture
Convolutions
29
http://setosa.io/ev/image-kernels/
Convolutions
30
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.ht
ml
New Layer Type: Convolutional Layer
31
• 2-d weighted average when multiply kernel over pixel patches
• We slide the kernel over all pixels of the image (handle
borders)
• Kernel starts off with “random” values and network updates
(learns) the kernel values (using backpropagation) to try
minimize loss
• Kernels shared across the whole image (parameter
Many Kernels = Many “Activation Maps” = Volume
32http://cs231n.github.io/convolutional-networks/
Convolutions
33
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visuali
zation.py
Convolutions
34
Convolutions
35
Convolutions
36
Great vid
37
https://www.youtube.com/watch?v=AgkfIQ4IGaM
New Layer Type: Max Pooling
38
New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next
• By replacing NxN sub-area with max value
• Makes network “look” at larger areas of the image at a time e.g.
Instead of identifying fur, identify cat
• Reduces computational load
• Reduces overfitting since losing information helps the network
generalize
39
Softmax
• Convert scores ∈ ℝ to probabilities ∈ [0,1]
• Then predict the class with highest
probability
40
Bringing it all together
xxx
41
Convolution + max pooling + fully connected +
softmax
Bringing it all together
42
Convolution + max pooling + fully connected +
softmax
We need labelled training data!
ImageNet
44
http://image-net.org/explore
ImageNet
45
ImageNet
46
ImageNet
47
3. How to use a
convolutional neural
network
Using a Pre-Trained ImageNet-Winning
CNN
49
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• The runner-up in ILSVRC 2014
• Network is 16 layers (deep!)
• Its main contribution was in showing that the depth of
the network is a critical component for good
performance.
• Only 3x3 (stride 1 pad 1) convolutions and 2x2 max
pooling
• Easy to fine-tune
https://blog.keras.io/building-powerful-image-classification-
models-using-very-little-data.html
Example: Classifying Product Images
50
https://github.com/alexcnwy/CTDL_CNN_TALK_20
170620
51
https://blog.keras.io/building-powerful-image-classification-
models-using-very-little-data.html
Fine-tuning A CNN To Solve A New Problem
• Cut off last layer of pre-trained Imagenet winning
CNN
• Keep learned network but replace final layer
• Can learn to predict different classes
• Fine-tuning is re-training new final layer
52
53
Before
54
After
Fine-tuning A CNN To Solve A New Problem
• Fix weights in convolutional layers (set
trainable=False)
• Remove final dense layer that predicts 1000
imagenet
• Replace with new dense layer to predict 9
categories
55
Gets 88% accuracy in classifying products into
categories
Visual Similarity
56
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these
activations
https://memeburn.com/2017/06/spree-image-search/
57
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
58
4. More advanced topics
cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Lots of Computer Vision Tasks
Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR
2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015
Semantic Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
http://blog.romanofoti.com/style_transfer/
https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Style Transfer
Loss = content_loss + style_loss
Content loss = convolutions from pre-trained network
Style loss = gram matrix from style image convolutions
5. Practical tips
http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-
winners-interview-2nd-place-felix-lau/
Computer Vision Pipelines
https://flyyufelix.github.io/2017/04/16/kaggle-
nature-conservancy.html
Practical Tips
• use a GPU – AWS p2 instances (use spot!)
• when overfitting (validation_accuracy <<< training_accuracy)
– Increase dropout
– Early stopping
• when underfitting (low training_accuracy)
1. Add more data
2. Use data augmentation
– Flipping / stretching / rotating
– Slightly change hues / brightness
3. Use more complex model
– Resnet / Inception / Densenet
– Ensamble (average <<< weighted average with learned weights)
70
Hackathon Challenge
Info
POTHOLES!
POTHOLES!
Hackathon Details
Challenge: Detect potholes in images
Binary classification problem
Given +- 5000 training images with labels
Given number of potholes + bounding box annotations
Given convolutional output for each image (in train & test)
Submit predictions on held-out test images
MOST ACCURATE MODEL WINS!
74
Hackathon Details
Can form teams of up to 4 people
You need to present your approach (2 min max)
Prizes for best presentation / most innovative approach
(even if not most accurate)
Don’t worry if you don’t know any of this stuff
– it’s a great opportunity to learn!
Hackathons are fun!!!!
75
Why this is an important problem
Potholes cause road deaths:
“The Automobile Association (AA) said that if there was proper
maintenance of our roads, then there could be an immediate
decrease in about 5% of road deaths … South Africa’s has
more than 700000 accidents occurring annually.”
http://www.roadcover.co.za/potholes-how-they-worsen-our-roads/
35’000 lives!!
2 levels:
• Alert the road authorities where they are to fix quickly
• Real time pothole avoidance (much harder!)
76
Big thanks to Dr. Steve Kroon for suggesting and helping with the
dataset
See you tomorrow!
Same venue (THE DIZ 111 Smit Street)
9am – 830pm
QUESTIONS?
Email me :)
Alex Conway
alex @ numberboost.com

Convolutional Neural Networks for Computer vision Applications

Editor's Notes

  • #56 Gets 88% accuracy in classifying products into categories