SlideShare a Scribd company logo
1 of 69
Download to read offline
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 1
Lecture 4: Convolutional Neural Networks for Computer Vision
Deep Learning @ UvA
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 2
o How to define our model and optimize it in practice
o Data preprocessing and normalization
o Optimization methods
o Regularizations
o Architectures and architectural hyper-parameters
o Learning rate
o Weight initializations
o Good practices
Previous lecture
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 3
o What are the Convolutional Neural Networks?
o Why are they important in Computer Vision?
o Differences from standard Neural Networks
o How to train a Convolutional Neural Network?
Lecture overview
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 4
Convolutional
Neural Networks
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 5
What makes images different?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 6
What makes images different?
Height
Width
Depth
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 7
What makes images different?
1920x1080x3 = 6,220,800 input variables
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 8
What makes images different?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 9
What makes images different?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 10
What makes images different?
Image has shifted a bit to the up and the left!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 11
o An image has spatial structure
o Huge dimensionality
◦ A 256x256 RGB image amounts to ~200K input variables
◦ 1-layered NN with 1,000 neurons  200 million parameters
o Images are stationary signals  they share features
◦ After variances images are still meaningful
◦ Small visual changes (often invisible to naked eye)  big changes to input vector
◦ Still, semantics remain
◦ Basic natural image statistics are the same
What makes images different?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 12
Input dimensions are correlated
“Higher” 28 6 Researcher Spain
Level of education Age Years of experience Nationality
Previous job
“Higher” 28 6 Researcher
Spain
Level of education Age Years of experience Nationality
Previous job
Shift 1 dimension
Traditional task: Predict my salary!
Vision task: Predict the picture!
First 5x5 values First 5x5 values
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 13
o Question: Spatial structure?
◦ Answer: Convolutional filters
o Question: Huge input dimensionalities?
◦ Answer: Parameters are shared between filters
o Question: Local variances?
◦ Answer: Pooling
Convolutional Neural Networks
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 14
Preserving
spatial structure
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 15
o Images are 2-D
◦ k-D if you also count the extra channels
◦ RGB, hyperspectral, etc.
o What does a 2-D input really mean?
◦ Neighboring variables are locally correlated
Why spatial?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 16
Example filter when K=1
−1 0 1
−2 0 2
−1 0 1
e.g. Sobel 2-D filter
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 17
o Several, handcrafted filters in
computer vision
◦ Canny, Sobel, Gaussian blur, smoothing, low-
level segmentation, morphological filters,
Gabor filters
o Are they optimal for recognition?
o Can we learn them from our data?
o Are they going resemble the
handcrafted filters?
Learnable filters
𝜃1 𝜃2 𝜃3
𝜃4 𝜃5 𝜃6
𝜃7 𝜃8 𝜃9
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 18
o If images are 2-D, parameters should also be organized in 2-D
◦ That way they can learn the local correlations between input variables
◦ That way they can “exploit” the spatial nature of images
2-D Filters (Parameters)
Grayscale
Filters
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 19
o Similarly, if images are k-D, parameters should also be k-D
K-D Filters (Parameters)
Grayscale
3 dims
RGB
k dims
Multiple channels
Filters
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 20
What would a k-D filter look like?
−1 0 1
−2 0 2
−1 0 1
e.g. Sobel 2-D filter
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 21
Think in space
3 dims
RGB
×
=
How many weights for this neuron?
7 ∙ 7 ∙ 3 = 147
7
7 7
7
3
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 22
Think in space
3 dims
RGB
×
=
Depth=5 dims
7
7
7
7
3
How many weights for these 5 neurons?
5 ∙ 7 ∙ 7 ∙ 3 = 735
The activations of a hidden layer form a volume of neurons, not a 1-d “chain”
This volume has a depth 5, as we have 5 filters
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 23
Local connectivity
o The weight connections are surface-wise local!
◦ Local connectivity
o The weights connections are depth-wise global
o For standard neurons no local connectivity
◦ Everything is connected to everything
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 24
Filters vs
Convolutional
k-d filters
Input
Output
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 25
Again, think in space
3 dims
RGB
×
=
7
7
7
7
3
735 weights
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 26
What about cover the full image with filters?
3 dims
RGB
×
=
7
7
Assume the image is 30x30x3.
1 filter every pixel (stride =1)
How many parameters in total?
Weight filters
5
24
24
24 filters along the 𝑥 axis
24 filters along the 𝑦 axis
7 ∗ 7 ∗ 3 parameters per filter
×
423𝐾 parameters in total
Depth of 5
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 27
o Clearly, too many parameters
o With a only 30 × 30 pixels image and a single hidden layer of depth 5 we
would need 85K parameters
◦ With a 256 × 256 image we would need 46 ∙ 106 parameters
o Problem 1: Fitting a model with that many parameters is not easy
o Problem 2: Finding the data for such a model is not easy
o Problem 3: Are all these weights necessary?
Problem!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 28
o Imagine
◦ With the right amount of data …
◦ … and if we connect all inputs of layer 𝑙 with all
outputs of layer 𝑙 + 1, …
◦ … and if we would visualize the (2d) filters
(local connectiveity  2d) …
◦ … we would see very similar filters no matter
their location
Hypothesis
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 29
o Imagine
◦ With the right amount of data …
◦ … and if we connect all inputs of layer 𝑙 with all
outputs of layer 𝑙 + 1, …
◦ … and if we would visualize the (2d) filters
(local connectiveity  2d) …
◦ … we would see very similar filters no matter
their location
o Why?
◦ Natural images are stationary
◦ Visual features are common for different parts
of one or multiple image
Hypothesis
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 30
o So, if we are anyways going to compute the same filters, why not share?
◦ Sharing is caring
Solution? Share!
3 dims
×
=
7
7
Assume the image is 30x30x3.
1 column of filters common across the image.
How many parameters in total?
7 ∗ 7 ∗ 3 parameters per filter
×
735 parameters in total
Depth of 5
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 31
Shared 2-D filters  Convolutions
Original image
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 32
Shared 2-D filters  Convolutions
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
Original image
1 0 1
0
1
1 0
0 1
Convolutional filter 1
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 33
Shared 2-D filters  Convolutions
4
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0
1 1 1
0 1 1
0 0
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x0 x1
1 1 1
0 1 1
0 0
0 0
1 0
1 1
0 0
0 1 1
1 0
0 0
x1 x0 x1
x0 x1
x1 x0
4
4
1 1 1
0 1 1
0 0
0 0
1 0
0 0
0 1
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
Convolving the image Result
Original image
1 0 1
0
1
1 0
0 1
Convolutional filter 1
𝐼 𝑥, 𝑦 ∗ ℎ = ෍
𝑖=−𝑎
𝑎
෍
𝑗=−𝑏
𝑏
𝐼 𝑥 − 𝑖, 𝑦 − 𝑗 ⋅ ℎ(𝑖, 𝑗)
Inner product
x0
x1 x0 x1
x0 x1
x1 x0 x1
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 34
Shared 2-D filters  Convolutions
4
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0
1 1 1
0 1 1
0 0
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x0 x1
1 1 1
0 1 1
0 0
0 0
1 0
1 1
0 0
0 1 1
1 0
0 0
x1 x0 x1
x0 x1
x1 x0
4
4
1 1 1
0 1 1
0 0
0 0
1 0
0 0
0 1
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
Convolving the image Result
Original image
1 0 1
0
1
1 0
0 1
Convolutional filter 1
𝐼 𝑥, 𝑦 ∗ ℎ = ෍
𝑖=−𝑎
𝑎
෍
𝑗=−𝑏
𝑏
𝐼 𝑥 − 𝑖, 𝑦 − 𝑗 ⋅ ℎ(𝑖, 𝑗)
Inner product
x0
x1 x0 x1
x0 x1
x1 x0 x1
3
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 35
Shared 2-D filters  Convolutions
4
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0 x1
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0 x1
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0 x1
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0 x1
4 3
4 3 4
4
2
3 4
4
2
2
3 4
4 3
3 4
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0 x1
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
Convolving the image Result
Original image
1 0 1
0
1
1 0
0 1
Convolutional filter 1
𝐼 𝑥, 𝑦 ∗ ℎ = ෍
𝑖=−𝑎
𝑎
෍
𝑗=−𝑏
𝑏
𝐼 𝑥 − 𝑖, 𝑦 − 𝑗 ⋅ ℎ(𝑖, 𝑗)
Inner product
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 36
Output dimensions?
=
ℎ𝑖𝑛
𝑤𝑖𝑛
𝑑𝑖𝑛
𝑑𝑓
ℎ𝑓
𝑤𝑓
𝑛𝑓
𝑤𝑜𝑢𝑡
ℎ𝑖𝑛
ℎ𝑜𝑢𝑡
𝑑𝑜𝑢𝑡
𝑤𝑖𝑛
𝑠
ℎ𝑜𝑢𝑡 =
ℎ𝑖𝑛 − ℎ𝑓
𝑠
+ 1
𝑤𝑜𝑢𝑡 =
𝑤𝑖𝑛 − 𝑤𝑓
𝑠
+ 1
𝑑𝑜𝑢𝑡 = 𝑛𝑓
𝑑𝑓 = 𝑑𝑖𝑛
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 37
Definition The convolution of two functions 𝑓 and 𝑔 is denoted by ∗ as the
integral of the product of the two functions after one is reversed and shifted
𝑓 ∗ 𝑔 𝑡 ≝ න
−∞
∞
𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏 = න
−∞
∞
𝑓 𝑡 − 𝜏 𝑔 𝜏 𝑑𝜏
Why call them convolutions?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 38
o Our images get smaller and smaller
o Not too deep architectures
o Details are lost
Problem, again :S
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
x0
x1 x0 x1
x0 x1
x1 x0 x1
4
2
2
3 4
4 3
3 4
18
Image After conv 1 After conv 2
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 39
o For s = 1, surround the image with (ℎ𝑓−1)/2 and (𝑤𝑓−1)/2 layers of 0
Solution? Zero-padding!
1 1 1
0 1 1
0 0 1
0 0
1 0
1 1
0 0 1
0 1 1
1 0
0 0
0 0 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0 0 0 0
0
0 0 1
0
1
1 1
1 1
∗
1 1 2
0 1 1
0 0 1
0 0
1 0
2 1
1 0 2
0 1 1
1 0
3 0
=
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 40
o Activation function
𝑎𝑟𝑐 = ෍
𝑖=−𝑎
𝑎
෍
𝑗=−𝑏
𝑏
𝑥𝑟−𝑖,𝑐−𝑗 ⋅ 𝜃𝑖𝑗
o Essentially a dot product, similar to linear layer
𝑎𝑟𝑐~𝑥𝑟𝑒𝑔𝑖𝑜𝑛
𝑇
⋅ 𝜃
o Gradient w.r.t. the parameters
𝜕𝑎𝑟𝑐
𝜕𝜃𝑖𝑗
= ෍
𝑟=0
𝑁−2𝑎
෍
𝑐=0
𝑁−2𝑏
𝑥𝑟−𝑖,𝑐−𝑗
Convolutional module (New module!!!)
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 41
import Tensorflow as tf
tf.nn.conv2d(input, filter, strides, padding)
Convolutional module in Tensorflow
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 42
o Resize the image to have a size in the power of 2
o Use stride 𝑠 = 1
o A filter of ℎ𝑓, 𝑤𝑓 = [3 × 3] works quite alright with deep architectures
o Add 1 layer of zero padding
o In general avoid combinations of hyper-parameters that do not click
◦ E.g. 𝑠 = 1
◦ [ℎ𝑓 × 𝑤𝑓] = [3 × 3] and
◦ image size ℎ𝑖𝑛 × 𝑤𝑖𝑛 = 6 × 6
◦ ℎ𝑜𝑢𝑡 × 𝑤𝑜𝑢𝑡 = [2.5 × 2.5]
◦ Programmatically worse, and worse accuracy because borders are ignored
Good practice
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 43
o When images are registered and each pixel has a particular significance
◦ E.g. after face alignment specific pixels hold specific types of inputs, like eyes, nose, etc.
o In these cases maybe better every spatial filter to have different parameters
◦ Network learns particular weights for particular image locations [Taigman2014]
P.S. Sometimes convolutional filters are not preferred
Eye location filters
Nose location filters
Mouth location filters
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 44
Pooling
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 45
o Aggregate multiple values into a single value
◦ Invariance to small transformations
◦ Reduces the size of the layer output/input to next layer  Faster computations
◦ Keeps most important information for the next layer
o Max pooling
o Average pooling
Pooling
a b
c d
z 𝑃𝑜𝑜𝑙 ( ) ==
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 46
o Run a sliding window of size [ℎ𝑓, 𝑤𝑓]
o At each location keep the maximum value
o Activation function: imax, jmax = arg max
𝑖,𝑗∈Ω(𝑟,𝑐)
𝑥𝑖𝑗 → 𝑎𝑟𝑐= 𝑥[imax, jmax]
o Gradient w.r.t. input
𝜕𝑎𝑟𝑐
𝜕𝑥𝑖𝑗
= ቊ
1, 𝑖𝑓 𝑖 = imax, j = jmax
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
o The preferred choice of pooling
Max pooling (New module!)
1 4 3
2 1 0
2 2 7
6
9
7
5 3 3 6
4 9
5 7
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 47
o Run a sliding window of size [ℎ𝑓, 𝑤𝑓]
o At each location keep the maximum value
o Activation function: 𝑎𝑟𝑐=
1
𝑟⋅𝑐
σ𝑖,𝑗∈Ω(𝑟,𝑐) 𝑥𝑖𝑗
o Gradient w.r.t. input
𝜕𝑎𝑟𝑐
𝜕𝑥𝑖𝑗
=
1
𝑟⋅𝑐
Average pooling (New module!)
1 4 1
2 3 0
1 2 7
6
9
1
4 1 0 2
5 8
4 5
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 48
Convnets for
Object Recognition
This is the Grumpy Cat!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 49
Conv.
layer 1
Standard Neural Network vs Convnets
Layer
1
Layer
2
Layer
3
Layer
4
Loss
Layer
5
Layer
5
Loss
Layer
6
Conv.
layer 1
Max
pool. 1
Conv.
layer 2
Neural Network
Convolutional Neural Network
ReLU
layer 1
ReLU
layer 2
Max
pool. 2
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 50
o Several convolutional layers
◦ 5 or more
o After the convolutional layers non-linearities are added
◦ The most popular one is the ReLU
o After the ReLU usually some pooling
◦ Most often max pooling
o After 5 rounds of cascading, vectorize last convolutional layer and connect
it to a fully connected layer
o Then proceed as in a usual neural network
Convets in practice
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 51
CNN Case Study I:
Alexnet
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 52
Architectural details
Layer
6
Loss
Layer
7
Max
pool. 2
227
227
11 × 11 5 × 5 3 × 3 3 × 3 3 × 3
4,096 4,096
Dropout
Dropout
18.2% error in Imagenet
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 53
Removing layer 7
Layer
6
Loss
Layer
7
Max
pool. 2
227
227
11 × 11 5 × 5 3 × 3 3 × 3 3 × 3
4,096 4,096
Dropout
Dropout
1.1% drop in performance, 16 million less parameters
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 54
Removing layer 6, 7
Layer
6
Loss
Layer
7
Max
pool. 2
227
227
11 × 11 5 × 5 3 × 3 3 × 3 3 × 3
4,096 4,096
Dropout
Dropout
5.7% drop in performance, 50 million less parameters
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 55
Removing layer 3, 4
Layer
6
Loss
Layer
7
Max
pool. 2
227
227
11 × 11 5 × 5 3 × 3 3 × 3 3 × 3
4,096 4,096
Dropout
Dropout
3.0% drop in performance, 1 million less parameters. Why?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 56
Removing layer 3, 4, 6, 7
Layer
6
Loss
Layer
7
Max
pool. 2
227
227
11 × 11 5 × 5 3 × 3 3 × 3 3 × 3
4,096 4,096
Dropout
Dropout
33.5% drop in performance. Conclusion? Depth!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 57
Translation invariance
Credit: R. Fergus slides in Deep Learning Summer School 2016
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 58
Scale invariance
Credit: R. Fergus slides in Deep Learning Summer School 2016
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 59
Rotation invariance
Credit: R. Fergus slides in Deep Learning Summer School 2016
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 60
CNN Case Study II:
VGGnet
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 61
o Much more accurate
◦ 6.8% vs 18.2% top-5 error
o About twice as many layers
◦ 16 vs 7 layers
o Filters are much smaller
◦ 3x3 vs 7x7 filters
o Harder/slower to train
Differences from Alexnet
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 62
CNN Case Study III:
ResNet [He2015]
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 63
o Instead of modelling directly 𝐻(𝑥),
model the residual 𝐻 𝑥 − 𝑥
o Adding identity layers should lead to
larger networks that have at least
lower training error
o If not maybe optimizers cannot
approximate identity mappings
o Modelling the residual, the optimizer
can simply set the weights to 0
How does ResNet work?
CNN CNN
1
CNN A CNN B
Why Error B > Error A since
identity layer 1 just copies
previous layer activations?
Shortcut/skip connections
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 64
o Without residual connections deeper
networks are untrainable
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 65
Other ConvNets
o Google Inception module
o Two-stream network
◦ Moving images (videos)
o Network in Network
o Deep Fried Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 66
o Two-stream network
◦ Moving images (videos)
o Network in Network
o Deep Fried Network
o Resnet
◦ Winner of ILSVRC 2016
More cases
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 67
Summary
o What are the Convolutional Neural Networks?
o Why are they so important for Computer
Vision?
o How do they differ from standard Neural
Networks?
o How can we train a Convolutional Neural
Network?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 68
o http://www.deeplearningbook.org/
◦ Part II: Chapter 9
[He2016] He, Zhang, Ren, Sun. Deep Residual Learning for Image Recognition, CVPR, 2016
[Simonyan2014] Simonyan, Zisserman/ Very Deep Convolutional, Networks for Large-Scale Image Recognition, arXiv, 2014
[Taigman2014] Taigman, Yang, Ranzato, Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face
Verification, CVPR, 2014
[Zeiler2014] Zeiler, Fergus. Visualizing and Understanding Convolutional Networks, ECCV, 2014
[Krizhevsky2012] Krizhevsky, Hinton. ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012
[LeCun1998] LeCun, Bottou, Bengio, Haffner. Gradient-Based Learning Applied to Document Recognition, IEEE, 1998
Reading material & references
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 69
Next lecture
o What do convolutions look like?
o Build on the visual intuition behind Convnets
o Deep Learning Feature maps
o Transfer Learning

More Related Content

Similar to lecture4.pdf

Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowYi-Fan Liou
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementSean Moran
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature GenerationAndres Mendez-Vazquez
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Fwdays
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Sneha Ravikumar
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 

Similar to lecture4.pdf (20)

Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with Tensorflow
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
2. filtering basics
2. filtering basics2. filtering basics
2. filtering basics
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Thesis_Oral
Thesis_OralThesis_Oral
Thesis_Oral
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature Generation
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 

More from Tigabu Yaya

ML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfTigabu Yaya
 
03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdfTigabu Yaya
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfTigabu Yaya
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfTigabu Yaya
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfTigabu Yaya
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdfTigabu Yaya
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.pptTigabu Yaya
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.pptTigabu Yaya
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfTigabu Yaya
 
C_and_C++_notes.pdf
C_and_C++_notes.pdfC_and_C++_notes.pdf
C_and_C++_notes.pdfTigabu Yaya
 
nabil-201-chap-03.ppt
nabil-201-chap-03.pptnabil-201-chap-03.ppt
nabil-201-chap-03.pptTigabu Yaya
 
Mat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.pptMat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.pptTigabu Yaya
 

More from Tigabu Yaya (20)

ML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdf
 
03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdf
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdf
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdf
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
lecture6.pdf
lecture6.pdflecture6.pdf
lecture6.pdf
 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.ppt
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.ppt
 
nnfl.0620.pptx
nnfl.0620.pptxnnfl.0620.pptx
nnfl.0620.pptx
 
L20.ppt
L20.pptL20.ppt
L20.ppt
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdf
 
C_and_C++_notes.pdf
C_and_C++_notes.pdfC_and_C++_notes.pdf
C_and_C++_notes.pdf
 
nabil-201-chap-03.ppt
nabil-201-chap-03.pptnabil-201-chap-03.ppt
nabil-201-chap-03.ppt
 
Mat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.pptMat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.ppt
 

Recently uploaded

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 

Recently uploaded (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 

lecture4.pdf

  • 1. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 1 Lecture 4: Convolutional Neural Networks for Computer Vision Deep Learning @ UvA
  • 2. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 2 o How to define our model and optimize it in practice o Data preprocessing and normalization o Optimization methods o Regularizations o Architectures and architectural hyper-parameters o Learning rate o Weight initializations o Good practices Previous lecture
  • 3. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 3 o What are the Convolutional Neural Networks? o Why are they important in Computer Vision? o Differences from standard Neural Networks o How to train a Convolutional Neural Network? Lecture overview
  • 4. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 4 Convolutional Neural Networks
  • 5. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 5 What makes images different?
  • 6. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 6 What makes images different? Height Width Depth
  • 7. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 7 What makes images different? 1920x1080x3 = 6,220,800 input variables
  • 8. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 8 What makes images different?
  • 9. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 9 What makes images different?
  • 10. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 10 What makes images different? Image has shifted a bit to the up and the left!
  • 11. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 11 o An image has spatial structure o Huge dimensionality ◦ A 256x256 RGB image amounts to ~200K input variables ◦ 1-layered NN with 1,000 neurons  200 million parameters o Images are stationary signals  they share features ◦ After variances images are still meaningful ◦ Small visual changes (often invisible to naked eye)  big changes to input vector ◦ Still, semantics remain ◦ Basic natural image statistics are the same What makes images different?
  • 12. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 12 Input dimensions are correlated “Higher” 28 6 Researcher Spain Level of education Age Years of experience Nationality Previous job “Higher” 28 6 Researcher Spain Level of education Age Years of experience Nationality Previous job Shift 1 dimension Traditional task: Predict my salary! Vision task: Predict the picture! First 5x5 values First 5x5 values
  • 13. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 13 o Question: Spatial structure? ◦ Answer: Convolutional filters o Question: Huge input dimensionalities? ◦ Answer: Parameters are shared between filters o Question: Local variances? ◦ Answer: Pooling Convolutional Neural Networks
  • 14. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 14 Preserving spatial structure
  • 15. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 15 o Images are 2-D ◦ k-D if you also count the extra channels ◦ RGB, hyperspectral, etc. o What does a 2-D input really mean? ◦ Neighboring variables are locally correlated Why spatial?
  • 16. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 16 Example filter when K=1 −1 0 1 −2 0 2 −1 0 1 e.g. Sobel 2-D filter
  • 17. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 17 o Several, handcrafted filters in computer vision ◦ Canny, Sobel, Gaussian blur, smoothing, low- level segmentation, morphological filters, Gabor filters o Are they optimal for recognition? o Can we learn them from our data? o Are they going resemble the handcrafted filters? Learnable filters 𝜃1 𝜃2 𝜃3 𝜃4 𝜃5 𝜃6 𝜃7 𝜃8 𝜃9
  • 18. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 18 o If images are 2-D, parameters should also be organized in 2-D ◦ That way they can learn the local correlations between input variables ◦ That way they can “exploit” the spatial nature of images 2-D Filters (Parameters) Grayscale Filters
  • 19. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 19 o Similarly, if images are k-D, parameters should also be k-D K-D Filters (Parameters) Grayscale 3 dims RGB k dims Multiple channels Filters
  • 20. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 20 What would a k-D filter look like? −1 0 1 −2 0 2 −1 0 1 e.g. Sobel 2-D filter
  • 21. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 21 Think in space 3 dims RGB × = How many weights for this neuron? 7 ∙ 7 ∙ 3 = 147 7 7 7 7 3
  • 22. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 22 Think in space 3 dims RGB × = Depth=5 dims 7 7 7 7 3 How many weights for these 5 neurons? 5 ∙ 7 ∙ 7 ∙ 3 = 735 The activations of a hidden layer form a volume of neurons, not a 1-d “chain” This volume has a depth 5, as we have 5 filters
  • 23. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 23 Local connectivity o The weight connections are surface-wise local! ◦ Local connectivity o The weights connections are depth-wise global o For standard neurons no local connectivity ◦ Everything is connected to everything
  • 24. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 24 Filters vs Convolutional k-d filters Input Output
  • 25. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 25 Again, think in space 3 dims RGB × = 7 7 7 7 3 735 weights
  • 26. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 26 What about cover the full image with filters? 3 dims RGB × = 7 7 Assume the image is 30x30x3. 1 filter every pixel (stride =1) How many parameters in total? Weight filters 5 24 24 24 filters along the 𝑥 axis 24 filters along the 𝑦 axis 7 ∗ 7 ∗ 3 parameters per filter × 423𝐾 parameters in total Depth of 5
  • 27. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 27 o Clearly, too many parameters o With a only 30 × 30 pixels image and a single hidden layer of depth 5 we would need 85K parameters ◦ With a 256 × 256 image we would need 46 ∙ 106 parameters o Problem 1: Fitting a model with that many parameters is not easy o Problem 2: Finding the data for such a model is not easy o Problem 3: Are all these weights necessary? Problem!
  • 28. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 28 o Imagine ◦ With the right amount of data … ◦ … and if we connect all inputs of layer 𝑙 with all outputs of layer 𝑙 + 1, … ◦ … and if we would visualize the (2d) filters (local connectiveity  2d) … ◦ … we would see very similar filters no matter their location Hypothesis
  • 29. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 29 o Imagine ◦ With the right amount of data … ◦ … and if we connect all inputs of layer 𝑙 with all outputs of layer 𝑙 + 1, … ◦ … and if we would visualize the (2d) filters (local connectiveity  2d) … ◦ … we would see very similar filters no matter their location o Why? ◦ Natural images are stationary ◦ Visual features are common for different parts of one or multiple image Hypothesis
  • 30. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 30 o So, if we are anyways going to compute the same filters, why not share? ◦ Sharing is caring Solution? Share! 3 dims × = 7 7 Assume the image is 30x30x3. 1 column of filters common across the image. How many parameters in total? 7 ∗ 7 ∗ 3 parameters per filter × 735 parameters in total Depth of 5
  • 31. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 31 Shared 2-D filters  Convolutions Original image
  • 32. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 32 Shared 2-D filters  Convolutions 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 Original image 1 0 1 0 1 1 0 0 1 Convolutional filter 1
  • 33. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 33 Shared 2-D filters  Convolutions 4 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 1 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x0 x1 1 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 x1 x0 x1 x0 x1 x1 x0 4 4 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 Convolving the image Result Original image 1 0 1 0 1 1 0 0 1 Convolutional filter 1 𝐼 𝑥, 𝑦 ∗ ℎ = ෍ 𝑖=−𝑎 𝑎 ෍ 𝑗=−𝑏 𝑏 𝐼 𝑥 − 𝑖, 𝑦 − 𝑗 ⋅ ℎ(𝑖, 𝑗) Inner product x0 x1 x0 x1 x0 x1 x1 x0 x1
  • 34. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 34 Shared 2-D filters  Convolutions 4 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 1 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x0 x1 1 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 x1 x0 x1 x0 x1 x1 x0 4 4 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 Convolving the image Result Original image 1 0 1 0 1 1 0 0 1 Convolutional filter 1 𝐼 𝑥, 𝑦 ∗ ℎ = ෍ 𝑖=−𝑎 𝑎 ෍ 𝑗=−𝑏 𝑏 𝐼 𝑥 − 𝑖, 𝑦 − 𝑗 ⋅ ℎ(𝑖, 𝑗) Inner product x0 x1 x0 x1 x0 x1 x1 x0 x1 3
  • 35. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 35 Shared 2-D filters  Convolutions 4 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 x1 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 x1 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 x1 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 x1 4 3 4 3 4 4 2 3 4 4 2 2 3 4 4 3 3 4 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 x1 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 Convolving the image Result Original image 1 0 1 0 1 1 0 0 1 Convolutional filter 1 𝐼 𝑥, 𝑦 ∗ ℎ = ෍ 𝑖=−𝑎 𝑎 ෍ 𝑗=−𝑏 𝑏 𝐼 𝑥 − 𝑖, 𝑦 − 𝑗 ⋅ ℎ(𝑖, 𝑗) Inner product
  • 36. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 36 Output dimensions? = ℎ𝑖𝑛 𝑤𝑖𝑛 𝑑𝑖𝑛 𝑑𝑓 ℎ𝑓 𝑤𝑓 𝑛𝑓 𝑤𝑜𝑢𝑡 ℎ𝑖𝑛 ℎ𝑜𝑢𝑡 𝑑𝑜𝑢𝑡 𝑤𝑖𝑛 𝑠 ℎ𝑜𝑢𝑡 = ℎ𝑖𝑛 − ℎ𝑓 𝑠 + 1 𝑤𝑜𝑢𝑡 = 𝑤𝑖𝑛 − 𝑤𝑓 𝑠 + 1 𝑑𝑜𝑢𝑡 = 𝑛𝑓 𝑑𝑓 = 𝑑𝑖𝑛
  • 37. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 37 Definition The convolution of two functions 𝑓 and 𝑔 is denoted by ∗ as the integral of the product of the two functions after one is reversed and shifted 𝑓 ∗ 𝑔 𝑡 ≝ න −∞ ∞ 𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏 = න −∞ ∞ 𝑓 𝑡 − 𝜏 𝑔 𝜏 𝑑𝜏 Why call them convolutions?
  • 38. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 38 o Our images get smaller and smaller o Not too deep architectures o Details are lost Problem, again :S 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 x0 x1 x0 x1 x0 x1 x1 x0 x1 4 2 2 3 4 4 3 3 4 18 Image After conv 1 After conv 2
  • 39. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 39 o For s = 1, surround the image with (ℎ𝑓−1)/2 and (𝑤𝑓−1)/2 layers of 0 Solution? Zero-padding! 1 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 ∗ 1 1 2 0 1 1 0 0 1 0 0 1 0 2 1 1 0 2 0 1 1 1 0 3 0 =
  • 40. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 40 o Activation function 𝑎𝑟𝑐 = ෍ 𝑖=−𝑎 𝑎 ෍ 𝑗=−𝑏 𝑏 𝑥𝑟−𝑖,𝑐−𝑗 ⋅ 𝜃𝑖𝑗 o Essentially a dot product, similar to linear layer 𝑎𝑟𝑐~𝑥𝑟𝑒𝑔𝑖𝑜𝑛 𝑇 ⋅ 𝜃 o Gradient w.r.t. the parameters 𝜕𝑎𝑟𝑐 𝜕𝜃𝑖𝑗 = ෍ 𝑟=0 𝑁−2𝑎 ෍ 𝑐=0 𝑁−2𝑏 𝑥𝑟−𝑖,𝑐−𝑗 Convolutional module (New module!!!)
  • 41. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 41 import Tensorflow as tf tf.nn.conv2d(input, filter, strides, padding) Convolutional module in Tensorflow
  • 42. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 42 o Resize the image to have a size in the power of 2 o Use stride 𝑠 = 1 o A filter of ℎ𝑓, 𝑤𝑓 = [3 × 3] works quite alright with deep architectures o Add 1 layer of zero padding o In general avoid combinations of hyper-parameters that do not click ◦ E.g. 𝑠 = 1 ◦ [ℎ𝑓 × 𝑤𝑓] = [3 × 3] and ◦ image size ℎ𝑖𝑛 × 𝑤𝑖𝑛 = 6 × 6 ◦ ℎ𝑜𝑢𝑡 × 𝑤𝑜𝑢𝑡 = [2.5 × 2.5] ◦ Programmatically worse, and worse accuracy because borders are ignored Good practice
  • 43. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 43 o When images are registered and each pixel has a particular significance ◦ E.g. after face alignment specific pixels hold specific types of inputs, like eyes, nose, etc. o In these cases maybe better every spatial filter to have different parameters ◦ Network learns particular weights for particular image locations [Taigman2014] P.S. Sometimes convolutional filters are not preferred Eye location filters Nose location filters Mouth location filters
  • 44. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 44 Pooling
  • 45. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 45 o Aggregate multiple values into a single value ◦ Invariance to small transformations ◦ Reduces the size of the layer output/input to next layer  Faster computations ◦ Keeps most important information for the next layer o Max pooling o Average pooling Pooling a b c d z 𝑃𝑜𝑜𝑙 ( ) ==
  • 46. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 46 o Run a sliding window of size [ℎ𝑓, 𝑤𝑓] o At each location keep the maximum value o Activation function: imax, jmax = arg max 𝑖,𝑗∈Ω(𝑟,𝑐) 𝑥𝑖𝑗 → 𝑎𝑟𝑐= 𝑥[imax, jmax] o Gradient w.r.t. input 𝜕𝑎𝑟𝑐 𝜕𝑥𝑖𝑗 = ቊ 1, 𝑖𝑓 𝑖 = imax, j = jmax 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 o The preferred choice of pooling Max pooling (New module!) 1 4 3 2 1 0 2 2 7 6 9 7 5 3 3 6 4 9 5 7
  • 47. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 47 o Run a sliding window of size [ℎ𝑓, 𝑤𝑓] o At each location keep the maximum value o Activation function: 𝑎𝑟𝑐= 1 𝑟⋅𝑐 σ𝑖,𝑗∈Ω(𝑟,𝑐) 𝑥𝑖𝑗 o Gradient w.r.t. input 𝜕𝑎𝑟𝑐 𝜕𝑥𝑖𝑗 = 1 𝑟⋅𝑐 Average pooling (New module!) 1 4 1 2 3 0 1 2 7 6 9 1 4 1 0 2 5 8 4 5
  • 48. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 48 Convnets for Object Recognition This is the Grumpy Cat!
  • 49. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 49 Conv. layer 1 Standard Neural Network vs Convnets Layer 1 Layer 2 Layer 3 Layer 4 Loss Layer 5 Layer 5 Loss Layer 6 Conv. layer 1 Max pool. 1 Conv. layer 2 Neural Network Convolutional Neural Network ReLU layer 1 ReLU layer 2 Max pool. 2
  • 50. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 50 o Several convolutional layers ◦ 5 or more o After the convolutional layers non-linearities are added ◦ The most popular one is the ReLU o After the ReLU usually some pooling ◦ Most often max pooling o After 5 rounds of cascading, vectorize last convolutional layer and connect it to a fully connected layer o Then proceed as in a usual neural network Convets in practice
  • 51. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 51 CNN Case Study I: Alexnet
  • 52. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 52 Architectural details Layer 6 Loss Layer 7 Max pool. 2 227 227 11 × 11 5 × 5 3 × 3 3 × 3 3 × 3 4,096 4,096 Dropout Dropout 18.2% error in Imagenet
  • 53. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 53 Removing layer 7 Layer 6 Loss Layer 7 Max pool. 2 227 227 11 × 11 5 × 5 3 × 3 3 × 3 3 × 3 4,096 4,096 Dropout Dropout 1.1% drop in performance, 16 million less parameters
  • 54. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 54 Removing layer 6, 7 Layer 6 Loss Layer 7 Max pool. 2 227 227 11 × 11 5 × 5 3 × 3 3 × 3 3 × 3 4,096 4,096 Dropout Dropout 5.7% drop in performance, 50 million less parameters
  • 55. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 55 Removing layer 3, 4 Layer 6 Loss Layer 7 Max pool. 2 227 227 11 × 11 5 × 5 3 × 3 3 × 3 3 × 3 4,096 4,096 Dropout Dropout 3.0% drop in performance, 1 million less parameters. Why?
  • 56. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 56 Removing layer 3, 4, 6, 7 Layer 6 Loss Layer 7 Max pool. 2 227 227 11 × 11 5 × 5 3 × 3 3 × 3 3 × 3 4,096 4,096 Dropout Dropout 33.5% drop in performance. Conclusion? Depth!
  • 57. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 57 Translation invariance Credit: R. Fergus slides in Deep Learning Summer School 2016
  • 58. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 58 Scale invariance Credit: R. Fergus slides in Deep Learning Summer School 2016
  • 59. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 59 Rotation invariance Credit: R. Fergus slides in Deep Learning Summer School 2016
  • 60. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 60 CNN Case Study II: VGGnet
  • 61. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 61 o Much more accurate ◦ 6.8% vs 18.2% top-5 error o About twice as many layers ◦ 16 vs 7 layers o Filters are much smaller ◦ 3x3 vs 7x7 filters o Harder/slower to train Differences from Alexnet
  • 62. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 62 CNN Case Study III: ResNet [He2015]
  • 63. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 63 o Instead of modelling directly 𝐻(𝑥), model the residual 𝐻 𝑥 − 𝑥 o Adding identity layers should lead to larger networks that have at least lower training error o If not maybe optimizers cannot approximate identity mappings o Modelling the residual, the optimizer can simply set the weights to 0 How does ResNet work? CNN CNN 1 CNN A CNN B Why Error B > Error A since identity layer 1 just copies previous layer activations? Shortcut/skip connections
  • 64. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 64 o Without residual connections deeper networks are untrainable
  • 65. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 65 Other ConvNets o Google Inception module o Two-stream network ◦ Moving images (videos) o Network in Network o Deep Fried Network
  • 66. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 66 o Two-stream network ◦ Moving images (videos) o Network in Network o Deep Fried Network o Resnet ◦ Winner of ILSVRC 2016 More cases
  • 67. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 67 Summary o What are the Convolutional Neural Networks? o Why are they so important for Computer Vision? o How do they differ from standard Neural Networks? o How can we train a Convolutional Neural Network?
  • 68. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 68 o http://www.deeplearningbook.org/ ◦ Part II: Chapter 9 [He2016] He, Zhang, Ren, Sun. Deep Residual Learning for Image Recognition, CVPR, 2016 [Simonyan2014] Simonyan, Zisserman/ Very Deep Convolutional, Networks for Large-Scale Image Recognition, arXiv, 2014 [Taigman2014] Taigman, Yang, Ranzato, Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR, 2014 [Zeiler2014] Zeiler, Fergus. Visualizing and Understanding Convolutional Networks, ECCV, 2014 [Krizhevsky2012] Krizhevsky, Hinton. ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012 [LeCun1998] LeCun, Bottou, Bengio, Haffner. Gradient-Based Learning Applied to Document Recognition, IEEE, 1998 Reading material & references
  • 69. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 69 Next lecture o What do convolutions look like? o Build on the visual intuition behind Convnets o Deep Learning Feature maps o Transfer Learning