SlideShare a Scribd company logo
Convolutional Neural Networks
- A breakthrough in Computer Vision
by
Paresh Kamble & Rucha Gole
PhD Student MTech Student
Department of Electronics and Communication Engineering,
VNIT, Nagpur, Maharashtra, India.
Introduction
 Pre-requisite: i) Linear Algebra, ii) Neural Network
 Ted talk - Fei Fei Li – How we teach computers to understand pictures
 Buzz about Deep Learning and CNN
 Applications of CNN – Image recognition, video analysis, medical imaging,
Games, etc.
 Companies that use CNN – Google, Facebook, Twitter, Instagram, IBM,
Intel, etc.
 Disclaimer – The presentation is loosely based on Coursera – Deep
Learning Specialization and CS231n course of Stanford University.
Common problems in Computer Vision
Traditional Neural Network vs. Convolutional Neural
Network
32
32
32 x 32 x 3 = 3072 1000
W12
1000 x 3072
dimensional matrix
Traditional Neural Network vs. Convolutional Neural
Network
1000
1000
1000 x 1000 x 3 = 3,000,000 1000
W12
1000 x 3,000,000
dimensional matrix!!!
= 3,000,000,000 features
Traditional Neural Network vs. Convolutional Neural
Network
Salient points:
• Spatial relation between features in image is not considered in NN.
• NN needs huge data to prevent overfitting!
• NN is not feasible for large images!
• In order to perform Computer Vision operations on large images, convolution
operation plays an important role.
• Thus, CNNs are fundamentally important.
Stages of feature extraction by CNN
Low level features Mid level features High level features
Edges, curves and colour Parts of objects Complete objects
Foundation of Convolutional Neural Networks
 Edge Detection
Foundation of Convolutional Neural Networks
 Edge Detection
Vertical edges
Foundation of Convolutional Neural Networks
 Edge Detection
Vertical edges
Horizontal edges
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2 4
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2 4 7
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2 4 7
3
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2 4 7
3 2
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2 4 7
3 2 3
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
-1 0 1
-1 0 1
-1 0 1
5 4 0 -8
10 2 2 -3
0 2 4 7
3 2 3 16
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
1 0 -1
1 0 -1
1 0 -1
0 30 30 0
0 30 30 0
0 30 30 0
0 30 30 0
Foundation of Convolutional Neural Networks
 Vertical Edge Detection
* =
0 0 0 10 10 10
0 0 0 10 10 10
0 0 0 10 10 10
0 0 0 10 10 10
0 0 0 10 10 10
0 0 0 10 10 10
1 0 -1
1 0 -1
1 0 -1
0 -30 -30 0
0 -30 -30 0
0 -30 -30 0
0 -30 -30 0
Foundation of Convolutional Neural Networks
 Vertical & Horizontal Edge Detection
1 0 -1
1 0 -1
1 0 -1
1 1 1
0 0 0
-1 -1 -1
1 0 -1
2 0 -2
1 0 -1
Foundation of Convolutional Neural Networks
 Vertical & Horizontal Edge Detection
* =
1 0 -1
1 0 -1
1 0 -1
1 1 1
0 0 0
-1 -1 -1
1 0 -1
2 0 -2
1 0 -1
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
w1 w2 w3
w4 w5 w6
w7 w8 w9
Learning the filter weights!
Foundation of Convolutional Neural Networks
 Padding
Image * Filter = Output Image
6 x 6 3 x 3 4 x 4
𝑛 ∗ 𝑛 𝑓 ∗ 𝑓 𝑛𝑜𝑢𝑡 ∗ 𝑛𝑜𝑢𝑡
using 𝑛𝑜𝑢𝑡 = 𝑛 − 𝑓 + 1
Shortcoming of this technique:
1) Output goes on shrinking as the number of layers increase.
2) Information from boundary of the image remains unused.
Solution is Zero padding around the edges of the image!
Foundation of Convolutional Neural Networks
 Padding
If Image is 6 x 6
𝑛𝑜𝑢𝑡 = 𝑛 − 𝑓 + 1
= 6 − 3 + 1
= 4
Thus, Output Image = 4 x 4
Foundation of Convolutional Neural Networks
 Padding
0 0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0
If 𝑝 =1
𝑛𝑜𝑢𝑡 = 𝑛 + 2𝑝 − 𝑓 + 1
= 6 + (2 ∗ 1) − 3 + 1
= 6
Thus, Output Image = 6 x 6
0 0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0 0
Foundation of Convolutional Neural Networks
 Padding
0 0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0
If 𝑝 =2
𝑛𝑜𝑢𝑡 = 𝑛 + 2𝑝 − 𝑓 + 1
= 6 + (2 ∗ 2) − 3 + 1
= 6 + 4 − 3 + 1
= 8
Thus, Output Image = 8 x 8
Foundation of Convolutional Neural Networks
 Padding
How much to pad?
1) Valid = no padding.
Follows the formula 𝑛𝑜𝑢𝑡 = 𝑛 − 𝑓 + 1.
2) Same = Output dimension is Same as input.
Follows the formula 𝑛𝑜𝑢𝑡 = 𝑛 + 2𝑝 − 𝑓 + 1.
To keep Output size same as input size: 𝑛 = 𝑛 + 2𝑝 − 𝑓 + 1
𝑝 =
𝑓−1
2
Thus, filters are generally odd.
Foundation of Convolutional Neural Networks
 Strided Convolution: Shifting of filter by 𝑠 pixels during convolution. Here,𝑠 = 2.
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4 2-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4 2
4
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4 2
4 -5
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4 2
4 -5 5
-1 0 1
-1 0 1
-1 0 1
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4 2
4 -5 5
-1 0 1
-1 0 1
-1 0 1
Here, Stride(S)=2,
Thus, 𝑛𝑜𝑢𝑡 =
𝑛+2𝑝−𝑓
𝑠
+ 1 =
7+2∗0−3
2
+ 1 =
4
2
+ 1
=
4
2
+ 1 = 3
If input image is 6x6 then Output Image will be ?
Foundation of Convolutional Neural Networks
 Strided Convolution
* =
3 0 1 2 7 4 2
1 5 8 9 3 1 1
2 7 2 5 1 3 5
0 1 3 1 7 8 4
4 2 1 6 2 8 3
2 4 5 2 3 9 8
2 3 6 4 2 0 1
-1 0 1
-1 0 1
-1 0 1
5 0 -3
0 4 2
4 -5 5
-1 0 1
-1 0 1
-1 0 1
Here, Stride(S)=2,
Thus, 𝑛𝑜𝑢𝑡 =
𝑛+2𝑝−𝑓
𝑠
+ 1 =
7+2∗0−3
2
+ 1 =
4
2
+ 1
=
4
2
+ 1 = 3
If input image is 6x6 then Output Image will be still 2x2 if
all other factors are constant.
Foundation of Convolutional Neural Networks
 Summary of Convolution
For an 𝑛 ∗ 𝑛 image and filter 𝑓 ∗ 𝑓 with padding 𝑝 and a stride of 𝑠 pixels,
Output image dimension is given by:
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1 ∗
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
Convolution vs. Cross-correlation
Convolution involves: flip on axes -> multiply -> sum
Cross-correlation involves: multiply -> sum
In CNNs, the process is cross-correlation but is referred to as convolution by
convention.
Convolution on Colour images
Image Filter Output
* = ?
6x6x3 3x3
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Convolution on Colour images
Image Filter Output
* =
6x6x3 3x3x3 4x4x1
Quiz 1
Filter 1 Detect vertical edges in R-plane
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 -1
1 0 -1
1 0 -1
Quiz 1
Filter 1 Detect vertical edges in R-plane
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 -1
1 0 -1
1 0 -1
Quiz 1
Filter 1 Detect vertical edges in R-plane
Filter 2 Detect vertical edges in all-plane
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
Quiz 1
Filter 1 Detect vertical edges in R-plane
Filter 2 Detect vertical edges in all-plane
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
1 0 -1
Multiple Filters
=
*
6x6x3 3x3x3 4x4x1
Horizontal edge
Multiple Filters
=
*
=
6x6x3 3x3x3 4x4x1
Horizontal edge
Vertical edge
Multiple Filters
=
*
=
6x6x3 3x3x3 4x4x1 4x4x2
Horizontal edge
Vertical edge
Summary of Convolution with Multiple filters
For an 𝑛 ∗ 𝑛 ∗ 𝑐 image and 𝑁 filters each 𝑓 ∗ 𝑓 ∗ 𝑐 with padding 𝑝 and a stride of
𝑠 pixels,
Output image dimension is given by:
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1 ∗
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1 ∗ 𝑁
Single Convolution Layer
= 𝑅𝑒𝐿𝑈 + 𝑏1
*
= 𝑅𝑒𝐿𝑈 + 𝑏2
𝑎[0]
𝑊[1]
𝑧[1]
= 𝑊[1]
𝑎[0]
+ 𝑏[1]
𝑎[1]
= 𝑔(𝑧[1]
)
Learnt filters and corresponding activations
Quiz 2
 No. of Trainable parameters (Weights + biases) in a one convolution layer?
No. of filters: 10
Volume of each filter: 3x3x3
Input volume: 32x32x3
Quiz 2
 No. of Trainable parameters (Weights + biases) in a one convolution layer?
No. of filters: 10
Volume of each filter: 3x3x3
Input volume: 32x32x3
Solution: No need of input volume. It is a catch.
Total trainable parameters = (3x3x3 weight + 1 bias) of each filter x 10 filters
= (27+1) x 10
= (28) x10
= 280.
CNN Example # 1 (Basic)
𝑎[0] 𝑎[1] 𝑎[2]
𝑎[3]
𝑛 𝐻
[0]
∗ 𝑛 𝑊
0
∗ 𝑛 𝐶
[0]
39 ∗ 39 ∗ 3
𝑓[1]
= 3
𝑠[1]
= 1
𝑝[1]
= 0
10 𝑓𝑖𝑙𝑡𝑒𝑟𝑠
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
39+2∗0−3
1
+ 1
= 37
𝑛 𝐻
[1]
∗ 𝑛 𝑊
1
∗ 𝑛 𝐶
[1]
37 ∗ 37 ∗ 10
𝑓[2]
= 5
𝑠[2]
= 2
𝑝[2]
= 0
20 𝑓𝑖𝑙𝑡𝑒𝑟𝑠
𝑛 𝐻
[2]
∗ 𝑛 𝑊
2
∗ 𝑛 𝐶
[2]
17 ∗ 17 ∗ 20
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
37+2∗0−5
2
+ 1
= 17
𝑓[3]
= 5
𝑠[3]
= 2
𝑝[3]
= 0
40 𝑓𝑖𝑙𝑡𝑒𝑟𝑠
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
17+2∗0−5
2
+ 1
= 7
𝑛 𝐻
[3]
∗ 𝑛 𝑊
3
∗ 𝑛 𝐶
[3]
7 ∗ 7 ∗ 40
7 ∗ 7 ∗ 40
= 1960
vectorize
𝑦
Logistic
Regression /
Softmax
Non-linear activation
 Sigmoid
 ReLU (Rectified Linear Unit)
Pooling Layer
Purpose:
1) To reduce the size of the layer to speed up computation.
2) To retain robust features.
Types:
1) Max Pooling
2) Average Pooling
Pooling Layer
1 3 2 1
3 6 8 9
8 3 3 9
4 6 8 2
6 9
8 9
Max
Pooling Layer
1 3 2 1
3 6 8 9
8 3 3 9
4 6 8 2
6 9
8 9
3.25 5
5.25 5.5
Average Max
Pooling Layer
Salient features:
1) Follows the process of convolution with filters of size 𝑓, stride 𝑠 and padding 𝑝 (not used much)
as the hyperparameters.
2) Filters have no weights. So, no trainable parameters.
3) Pooling operation is carried out channel-wise. Thus, number of channels remain unchanged.
1 3 2 1
3 6 8 9
8 3 3 9
4 6 8 2
6 9
8 9
3.25 5
5.25 5.5
Average Max
CNN Example # 2 (Fully Loaded)
𝐼𝑛𝑝𝑢𝑡 𝐶𝑜𝑛𝑣1
𝐶𝑜𝑛𝑣2 𝑃𝑜𝑜𝑙2
32 ∗ 32 ∗ 3
𝑓 = 5
𝑠 = 1
𝑝 = 0
6 𝑓𝑖𝑙𝑡𝑒𝑟𝑠
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
32+2∗0−5
1
+ 1
= 28
28 ∗ 28 ∗ 6
14∗14∗6
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
28+2∗0−2
2
+ 1
= 14
𝑓 = 5
𝑠 = 1
16 𝑓𝑖𝑙𝑡𝑒𝑟𝑠
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
14+2∗0−5
1
+ 1
= 10
5∗5∗16
5∗5∗16
=400
vectorize
𝑦
Softmax
𝑓 = 2
𝑠 = 2
𝑃𝑜𝑜𝑙1
10∗10∗16
𝑓 = 2
𝑠 = 2
120
84
𝐹𝐶3
𝐹𝐶4
𝑛 + 2𝑝 − 𝑓
𝑠
+ 1
=
10+2∗0−2
2
+ 1
= 5
𝐿𝑎𝑦𝑒𝑟 1 𝐿𝑎𝑦𝑒𝑟 2
𝐿𝑎𝑦𝑒𝑟 3 𝐿𝑎𝑦𝑒𝑟 4
𝑊3
(120,400),𝑏3
(120)
𝑊4
(84,120),𝑏4
(84)
𝑊5
(10,84),𝑏5
(10)
10
Computation of parameters
 Fully connected layer vs. Convolution layer
No Layer Type Input data dim. Filter dim.
(if any)
Output data
dim.
Activation size Weights Biases No. of learnable
parameters
1 Input - - 32*32*3 3072
2 Conv Layer 1 32*32*3 6 filters * 5*5*3 28*28*6 4704 450 6 456
3 ReLU 1 28*28*6 - 28*28*6 4704
4 Maxpool 1 28*28*6 2*2*6 14*14*6 1176
5 Conv Layer 2 14*14*6 16 filters * 5*5*6 10*10*16 1600 2400 16 2416
6 ReLU 2 10*10*16 - 10*10*16 1600
7 Maxpool 2 10*10*16 2*2*16 5*5*16 400
8 FC Layer 3 5*5*16 = 400 - 120 120 48000 1 48001
9 ReLU 3 120 - 120 120
10 FC Layer 4 120 - 84 84 10080 1 10081
11 ReLU 4 84 - 84 84
12 Softmax Output 84 - 10 10 840 1 841
Total Learnable Parameters: 61,795
Why Convolutional ?
32 ∗ 32 ∗ 3
𝑓 = 5
𝑠 = 1
𝑝 = 0
6 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 28 ∗ 28 ∗ 6
32 ∗ 32 ∗ 3
= 3072
28 ∗ 28 ∗ 6
= 4704
3072 ∗ 4704
= 𝟏, 𝟒𝟒, 𝟓𝟎, 𝟔𝟖𝟖 parameters
6 filters * 5*5*3
= 450 weights + 6 biases
= 456 parameters
𝑓 = 5
𝑠 = 1
𝑝 = 0
6 𝑓𝑖𝑙𝑡𝑒𝑟𝑠
FC Neural Network Conv. Neural Network
Why Convolutional ?
* =
1 0 -1
1 0 -1
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
0
Why Convolutional ?
* =
1 0 -1
1 0 -1
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
0 30
Parameter Sharing: A feature detector that is useful in one part of the image may also be
useful on another part of the same image.
Why Convolutional ?
* =
1 0 -1
1 0 -1
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
0 30 30 0
0 30 30 0
0 30 30 0
0 30 30 0
Parameter Sharing: A feature detector that is useful in one part of the image may also be
useful on another part of the same image.
Sparcity of connections: In each layer, output depends only on small number of inputs.
Putting it altogether
Training set: 𝑥 1 , 𝑦 1 , … 𝑥 𝑚 , 𝑦 𝑚 .
𝑦
𝐶𝑜𝑠𝑡 𝐽 = 1
𝑚 𝓛(𝑦(𝑖)−𝑦(𝑖))𝑚
𝑖=1
Use gradient descent (or other optimization algorithms) to optimize parameters to reduce 𝐽.
A few important terms
 Forward pass: Process of passing the data from input layer to output layer.
 Cost Function: Difference between the predicted and true output of a NN.
 Backpropagation: The process of updating parameters of a network
depending on the cost function (using optimization algorithms viz., Gradient
Descent, SGDM, ADAGrad, etc.) to minimize the cost.
 Mini-batch: Number of images passing at once through the network.
 Learning Rate: Speed by which the parameters are updated.
 Iteration: A mini-batch performing a forward and a backward pass through
the network is an iteration.
 Epoch: When the complete dataset undergoes a forward and a backward
pass, an epoch is completed.
Transfer Learning
1) Train from scratch
Train
Transfer Learning
2) Transfer Learning
Freeze the trained weights Train
Pretrained models of AlexNet, VGG-16, VGG-19, GoogleNet, ZFNet, etc. trained on
(say) ImageNet dataset.
Transfer Learning
Parameters Training from scratch Transfer Learning
Method Build CNN from scratch Only last few layers need to be trained
Tuning Need to tune large number of
hyperparameters
Only a few hyperparameters need to be tuned
Computation Large computation power is required
(multiple GPUs)
Less computation power needed (can even work
with CPUs)
Dataset Huge dataset needed to avoid overfitting A small dataset is enough
Training time May take weeks or even months May take hours to train
Popular Languages for Deep Learning
 Python
 Java
 R
 C++
 C
 Javascript
 Scala
 Julia
Popular Deep Learning Libraries
 TensorFlow
 Pytorch
 Caffe
 Keras
 Theano
 Deep Learning Toolbox – Matlab
 MatConvNet
Popular Software / IDE for Deep Learning
 PyCharm
 Spyder
 Jupyter Notebook
 Matlab (Deep Learning toolbox) R2018b
 Anaconda (contains Sypder, Jupyter Notebook, numpy, scipy, etc.)
 Google CoLab (similar to Jupyter Notebook; gives K80 GPU for 12hrs at a
time.)
Resources
 http://cs231n.github.io/convolutional-networks/
 https://www.coursera.org/learn/convolutional-neural-networks
 https://itunes.apple.com/us/app/flower/id1279174518?mt=8
 https://fossbytes.com/popular-top-programming-languages-machine-
learning-data-science/
Thank you!
Any Queries?

More Related Content

What's hot

Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Muhammad Haroon
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Resnet
ResnetResnet
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Bert
BertBert
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
Auro Tripathy
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
Abhishek Sharma
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
Rouyun Pan
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
BalneSridevi
 

What's hot (20)

Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Resnet
ResnetResnet
Resnet
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Bert
BertBert
Bert
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 

Recently uploaded

IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 

Recently uploaded (20)

IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 

Understanding cnn

  • 1. Convolutional Neural Networks - A breakthrough in Computer Vision by Paresh Kamble & Rucha Gole PhD Student MTech Student Department of Electronics and Communication Engineering, VNIT, Nagpur, Maharashtra, India.
  • 2. Introduction  Pre-requisite: i) Linear Algebra, ii) Neural Network  Ted talk - Fei Fei Li – How we teach computers to understand pictures  Buzz about Deep Learning and CNN  Applications of CNN – Image recognition, video analysis, medical imaging, Games, etc.  Companies that use CNN – Google, Facebook, Twitter, Instagram, IBM, Intel, etc.  Disclaimer – The presentation is loosely based on Coursera – Deep Learning Specialization and CS231n course of Stanford University.
  • 3. Common problems in Computer Vision
  • 4. Traditional Neural Network vs. Convolutional Neural Network 32 32 32 x 32 x 3 = 3072 1000 W12 1000 x 3072 dimensional matrix
  • 5. Traditional Neural Network vs. Convolutional Neural Network 1000 1000 1000 x 1000 x 3 = 3,000,000 1000 W12 1000 x 3,000,000 dimensional matrix!!! = 3,000,000,000 features
  • 6. Traditional Neural Network vs. Convolutional Neural Network Salient points: • Spatial relation between features in image is not considered in NN. • NN needs huge data to prevent overfitting! • NN is not feasible for large images! • In order to perform Computer Vision operations on large images, convolution operation plays an important role. • Thus, CNNs are fundamentally important.
  • 7. Stages of feature extraction by CNN Low level features Mid level features High level features Edges, curves and colour Parts of objects Complete objects
  • 8. Foundation of Convolutional Neural Networks  Edge Detection
  • 9. Foundation of Convolutional Neural Networks  Edge Detection Vertical edges
  • 10. Foundation of Convolutional Neural Networks  Edge Detection Vertical edges Horizontal edges
  • 11. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1
  • 12. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 -1 0 1 -1 0 1 -1 0 1
  • 13. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 -1 0 1 -1 0 1 -1 0 1
  • 14. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -1 0 1 -1 0 1 -1 0 1
  • 15. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 -1 0 1 -1 0 1 -1 0 1
  • 16. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 -1 0 1 -1 0 1 -1 0 1
  • 17. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 -1 0 1 -1 0 1 -1 0 1
  • 18. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -1 0 1 -1 0 1 -1 0 1
  • 19. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 -1 0 1 -1 0 1 -1 0 1
  • 20. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 -1 0 1 -1 0 1 -1 0 1
  • 21. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 -1 0 1 -1 0 1 -1 0 1
  • 22. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 4 -1 0 1 -1 0 1 -1 0 1
  • 23. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 4 7 -1 0 1 -1 0 1 -1 0 1
  • 24. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 4 7 3 -1 0 1 -1 0 1 -1 0 1
  • 25. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 4 7 3 2 -1 0 1 -1 0 1 -1 0 1
  • 26. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 4 7 3 2 3 -1 0 1 -1 0 1 -1 0 1
  • 27. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 -1 0 1 -1 0 1 -1 0 1 5 4 0 -8 10 2 2 -3 0 2 4 7 3 2 3 16 -1 0 1 -1 0 1 -1 0 1
  • 28. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 1 0 -1 1 0 -1 1 0 -1 0 30 30 0 0 30 30 0 0 30 30 0 0 30 30 0
  • 29. Foundation of Convolutional Neural Networks  Vertical Edge Detection * = 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 1 0 -1 1 0 -1 1 0 -1 0 -30 -30 0 0 -30 -30 0 0 -30 -30 0 0 -30 -30 0
  • 30. Foundation of Convolutional Neural Networks  Vertical & Horizontal Edge Detection 1 0 -1 1 0 -1 1 0 -1 1 1 1 0 0 0 -1 -1 -1 1 0 -1 2 0 -2 1 0 -1
  • 31. Foundation of Convolutional Neural Networks  Vertical & Horizontal Edge Detection * = 1 0 -1 1 0 -1 1 0 -1 1 1 1 0 0 0 -1 -1 -1 1 0 -1 2 0 -2 1 0 -1 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 w1 w2 w3 w4 w5 w6 w7 w8 w9 Learning the filter weights!
  • 32. Foundation of Convolutional Neural Networks  Padding Image * Filter = Output Image 6 x 6 3 x 3 4 x 4 𝑛 ∗ 𝑛 𝑓 ∗ 𝑓 𝑛𝑜𝑢𝑡 ∗ 𝑛𝑜𝑢𝑡 using 𝑛𝑜𝑢𝑡 = 𝑛 − 𝑓 + 1 Shortcoming of this technique: 1) Output goes on shrinking as the number of layers increase. 2) Information from boundary of the image remains unused. Solution is Zero padding around the edges of the image!
  • 33. Foundation of Convolutional Neural Networks  Padding If Image is 6 x 6 𝑛𝑜𝑢𝑡 = 𝑛 − 𝑓 + 1 = 6 − 3 + 1 = 4 Thus, Output Image = 4 x 4
  • 34. Foundation of Convolutional Neural Networks  Padding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 If 𝑝 =1 𝑛𝑜𝑢𝑡 = 𝑛 + 2𝑝 − 𝑓 + 1 = 6 + (2 ∗ 1) − 3 + 1 = 6 Thus, Output Image = 6 x 6
  • 35. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Foundation of Convolutional Neural Networks  Padding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 If 𝑝 =2 𝑛𝑜𝑢𝑡 = 𝑛 + 2𝑝 − 𝑓 + 1 = 6 + (2 ∗ 2) − 3 + 1 = 6 + 4 − 3 + 1 = 8 Thus, Output Image = 8 x 8
  • 36. Foundation of Convolutional Neural Networks  Padding How much to pad? 1) Valid = no padding. Follows the formula 𝑛𝑜𝑢𝑡 = 𝑛 − 𝑓 + 1. 2) Same = Output dimension is Same as input. Follows the formula 𝑛𝑜𝑢𝑡 = 𝑛 + 2𝑝 − 𝑓 + 1. To keep Output size same as input size: 𝑛 = 𝑛 + 2𝑝 − 𝑓 + 1 𝑝 = 𝑓−1 2 Thus, filters are generally odd.
  • 37. Foundation of Convolutional Neural Networks  Strided Convolution: Shifting of filter by 𝑠 pixels during convolution. Here,𝑠 = 2. * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 -1 0 1 -1 0 1 -1 0 1
  • 38. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -1 0 1 -1 0 1 -1 0 1
  • 39. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 -1 0 1 -1 0 1 -1 0 1
  • 40. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0-1 0 1 -1 0 1 -1 0 1
  • 41. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4-1 0 1 -1 0 1 -1 0 1
  • 42. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4 2-1 0 1 -1 0 1 -1 0 1
  • 43. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4 2 4 -1 0 1 -1 0 1 -1 0 1
  • 44. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4 2 4 -5 -1 0 1 -1 0 1 -1 0 1
  • 45. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4 2 4 -5 5 -1 0 1 -1 0 1 -1 0 1
  • 46. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4 2 4 -5 5 -1 0 1 -1 0 1 -1 0 1 Here, Stride(S)=2, Thus, 𝑛𝑜𝑢𝑡 = 𝑛+2𝑝−𝑓 𝑠 + 1 = 7+2∗0−3 2 + 1 = 4 2 + 1 = 4 2 + 1 = 3 If input image is 6x6 then Output Image will be ?
  • 47. Foundation of Convolutional Neural Networks  Strided Convolution * = 3 0 1 2 7 4 2 1 5 8 9 3 1 1 2 7 2 5 1 3 5 0 1 3 1 7 8 4 4 2 1 6 2 8 3 2 4 5 2 3 9 8 2 3 6 4 2 0 1 -1 0 1 -1 0 1 -1 0 1 5 0 -3 0 4 2 4 -5 5 -1 0 1 -1 0 1 -1 0 1 Here, Stride(S)=2, Thus, 𝑛𝑜𝑢𝑡 = 𝑛+2𝑝−𝑓 𝑠 + 1 = 7+2∗0−3 2 + 1 = 4 2 + 1 = 4 2 + 1 = 3 If input image is 6x6 then Output Image will be still 2x2 if all other factors are constant.
  • 48. Foundation of Convolutional Neural Networks  Summary of Convolution For an 𝑛 ∗ 𝑛 image and filter 𝑓 ∗ 𝑓 with padding 𝑝 and a stride of 𝑠 pixels, Output image dimension is given by: 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 ∗ 𝑛 + 2𝑝 − 𝑓 𝑠 + 1
  • 49. Convolution vs. Cross-correlation Convolution involves: flip on axes -> multiply -> sum Cross-correlation involves: multiply -> sum In CNNs, the process is cross-correlation but is referred to as convolution by convention.
  • 50. Convolution on Colour images Image Filter Output * = ? 6x6x3 3x3
  • 51. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 52. Image Filter Output * = 6x6x3 3x3x3 4x4x1 Convolution on Colour images
  • 53. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 54. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 55. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 56. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 57. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 58. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 59. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 60. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 61. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 62. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 63. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 64. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 65. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 66. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 67. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 68. Convolution on Colour images Image Filter Output * = 6x6x3 3x3x3 4x4x1
  • 69. Quiz 1 Filter 1 Detect vertical edges in R-plane 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 1 0 -1 1 0 -1
  • 70. Quiz 1 Filter 1 Detect vertical edges in R-plane 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 1 0 -1 1 0 -1
  • 71. Quiz 1 Filter 1 Detect vertical edges in R-plane Filter 2 Detect vertical edges in all-plane 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
  • 72. Quiz 1 Filter 1 Detect vertical edges in R-plane Filter 2 Detect vertical edges in all-plane 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
  • 73. Multiple Filters = * 6x6x3 3x3x3 4x4x1 Horizontal edge
  • 74. Multiple Filters = * = 6x6x3 3x3x3 4x4x1 Horizontal edge Vertical edge
  • 75. Multiple Filters = * = 6x6x3 3x3x3 4x4x1 4x4x2 Horizontal edge Vertical edge
  • 76. Summary of Convolution with Multiple filters For an 𝑛 ∗ 𝑛 ∗ 𝑐 image and 𝑁 filters each 𝑓 ∗ 𝑓 ∗ 𝑐 with padding 𝑝 and a stride of 𝑠 pixels, Output image dimension is given by: 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 ∗ 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 ∗ 𝑁
  • 77. Single Convolution Layer = 𝑅𝑒𝐿𝑈 + 𝑏1 * = 𝑅𝑒𝐿𝑈 + 𝑏2 𝑎[0] 𝑊[1] 𝑧[1] = 𝑊[1] 𝑎[0] + 𝑏[1] 𝑎[1] = 𝑔(𝑧[1] )
  • 78. Learnt filters and corresponding activations
  • 79. Quiz 2  No. of Trainable parameters (Weights + biases) in a one convolution layer? No. of filters: 10 Volume of each filter: 3x3x3 Input volume: 32x32x3
  • 80. Quiz 2  No. of Trainable parameters (Weights + biases) in a one convolution layer? No. of filters: 10 Volume of each filter: 3x3x3 Input volume: 32x32x3 Solution: No need of input volume. It is a catch. Total trainable parameters = (3x3x3 weight + 1 bias) of each filter x 10 filters = (27+1) x 10 = (28) x10 = 280.
  • 81. CNN Example # 1 (Basic) 𝑎[0] 𝑎[1] 𝑎[2] 𝑎[3] 𝑛 𝐻 [0] ∗ 𝑛 𝑊 0 ∗ 𝑛 𝐶 [0] 39 ∗ 39 ∗ 3 𝑓[1] = 3 𝑠[1] = 1 𝑝[1] = 0 10 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 39+2∗0−3 1 + 1 = 37 𝑛 𝐻 [1] ∗ 𝑛 𝑊 1 ∗ 𝑛 𝐶 [1] 37 ∗ 37 ∗ 10 𝑓[2] = 5 𝑠[2] = 2 𝑝[2] = 0 20 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑛 𝐻 [2] ∗ 𝑛 𝑊 2 ∗ 𝑛 𝐶 [2] 17 ∗ 17 ∗ 20 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 37+2∗0−5 2 + 1 = 17 𝑓[3] = 5 𝑠[3] = 2 𝑝[3] = 0 40 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 17+2∗0−5 2 + 1 = 7 𝑛 𝐻 [3] ∗ 𝑛 𝑊 3 ∗ 𝑛 𝐶 [3] 7 ∗ 7 ∗ 40 7 ∗ 7 ∗ 40 = 1960 vectorize 𝑦 Logistic Regression / Softmax
  • 82. Non-linear activation  Sigmoid  ReLU (Rectified Linear Unit)
  • 83. Pooling Layer Purpose: 1) To reduce the size of the layer to speed up computation. 2) To retain robust features. Types: 1) Max Pooling 2) Average Pooling
  • 84. Pooling Layer 1 3 2 1 3 6 8 9 8 3 3 9 4 6 8 2 6 9 8 9 Max
  • 85. Pooling Layer 1 3 2 1 3 6 8 9 8 3 3 9 4 6 8 2 6 9 8 9 3.25 5 5.25 5.5 Average Max
  • 86. Pooling Layer Salient features: 1) Follows the process of convolution with filters of size 𝑓, stride 𝑠 and padding 𝑝 (not used much) as the hyperparameters. 2) Filters have no weights. So, no trainable parameters. 3) Pooling operation is carried out channel-wise. Thus, number of channels remain unchanged. 1 3 2 1 3 6 8 9 8 3 3 9 4 6 8 2 6 9 8 9 3.25 5 5.25 5.5 Average Max
  • 87. CNN Example # 2 (Fully Loaded) 𝐼𝑛𝑝𝑢𝑡 𝐶𝑜𝑛𝑣1 𝐶𝑜𝑛𝑣2 𝑃𝑜𝑜𝑙2 32 ∗ 32 ∗ 3 𝑓 = 5 𝑠 = 1 𝑝 = 0 6 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 32+2∗0−5 1 + 1 = 28 28 ∗ 28 ∗ 6 14∗14∗6 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 28+2∗0−2 2 + 1 = 14 𝑓 = 5 𝑠 = 1 16 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 14+2∗0−5 1 + 1 = 10 5∗5∗16 5∗5∗16 =400 vectorize 𝑦 Softmax 𝑓 = 2 𝑠 = 2 𝑃𝑜𝑜𝑙1 10∗10∗16 𝑓 = 2 𝑠 = 2 120 84 𝐹𝐶3 𝐹𝐶4 𝑛 + 2𝑝 − 𝑓 𝑠 + 1 = 10+2∗0−2 2 + 1 = 5 𝐿𝑎𝑦𝑒𝑟 1 𝐿𝑎𝑦𝑒𝑟 2 𝐿𝑎𝑦𝑒𝑟 3 𝐿𝑎𝑦𝑒𝑟 4 𝑊3 (120,400),𝑏3 (120) 𝑊4 (84,120),𝑏4 (84) 𝑊5 (10,84),𝑏5 (10) 10
  • 88. Computation of parameters  Fully connected layer vs. Convolution layer No Layer Type Input data dim. Filter dim. (if any) Output data dim. Activation size Weights Biases No. of learnable parameters 1 Input - - 32*32*3 3072 2 Conv Layer 1 32*32*3 6 filters * 5*5*3 28*28*6 4704 450 6 456 3 ReLU 1 28*28*6 - 28*28*6 4704 4 Maxpool 1 28*28*6 2*2*6 14*14*6 1176 5 Conv Layer 2 14*14*6 16 filters * 5*5*6 10*10*16 1600 2400 16 2416 6 ReLU 2 10*10*16 - 10*10*16 1600 7 Maxpool 2 10*10*16 2*2*16 5*5*16 400 8 FC Layer 3 5*5*16 = 400 - 120 120 48000 1 48001 9 ReLU 3 120 - 120 120 10 FC Layer 4 120 - 84 84 10080 1 10081 11 ReLU 4 84 - 84 84 12 Softmax Output 84 - 10 10 840 1 841 Total Learnable Parameters: 61,795
  • 89. Why Convolutional ? 32 ∗ 32 ∗ 3 𝑓 = 5 𝑠 = 1 𝑝 = 0 6 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 28 ∗ 28 ∗ 6 32 ∗ 32 ∗ 3 = 3072 28 ∗ 28 ∗ 6 = 4704 3072 ∗ 4704 = 𝟏, 𝟒𝟒, 𝟓𝟎, 𝟔𝟖𝟖 parameters 6 filters * 5*5*3 = 450 weights + 6 biases = 456 parameters 𝑓 = 5 𝑠 = 1 𝑝 = 0 6 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 FC Neural Network Conv. Neural Network
  • 90. Why Convolutional ? * = 1 0 -1 1 0 -1 1 0 -1 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 0
  • 91. Why Convolutional ? * = 1 0 -1 1 0 -1 1 0 -1 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 0 30 Parameter Sharing: A feature detector that is useful in one part of the image may also be useful on another part of the same image.
  • 92. Why Convolutional ? * = 1 0 -1 1 0 -1 1 0 -1 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 0 30 30 0 0 30 30 0 0 30 30 0 0 30 30 0 Parameter Sharing: A feature detector that is useful in one part of the image may also be useful on another part of the same image. Sparcity of connections: In each layer, output depends only on small number of inputs.
  • 93. Putting it altogether Training set: 𝑥 1 , 𝑦 1 , … 𝑥 𝑚 , 𝑦 𝑚 . 𝑦 𝐶𝑜𝑠𝑡 𝐽 = 1 𝑚 𝓛(𝑦(𝑖)−𝑦(𝑖))𝑚 𝑖=1 Use gradient descent (or other optimization algorithms) to optimize parameters to reduce 𝐽.
  • 94. A few important terms  Forward pass: Process of passing the data from input layer to output layer.  Cost Function: Difference between the predicted and true output of a NN.  Backpropagation: The process of updating parameters of a network depending on the cost function (using optimization algorithms viz., Gradient Descent, SGDM, ADAGrad, etc.) to minimize the cost.  Mini-batch: Number of images passing at once through the network.  Learning Rate: Speed by which the parameters are updated.  Iteration: A mini-batch performing a forward and a backward pass through the network is an iteration.  Epoch: When the complete dataset undergoes a forward and a backward pass, an epoch is completed.
  • 95. Transfer Learning 1) Train from scratch Train
  • 96. Transfer Learning 2) Transfer Learning Freeze the trained weights Train Pretrained models of AlexNet, VGG-16, VGG-19, GoogleNet, ZFNet, etc. trained on (say) ImageNet dataset.
  • 97. Transfer Learning Parameters Training from scratch Transfer Learning Method Build CNN from scratch Only last few layers need to be trained Tuning Need to tune large number of hyperparameters Only a few hyperparameters need to be tuned Computation Large computation power is required (multiple GPUs) Less computation power needed (can even work with CPUs) Dataset Huge dataset needed to avoid overfitting A small dataset is enough Training time May take weeks or even months May take hours to train
  • 98. Popular Languages for Deep Learning  Python  Java  R  C++  C  Javascript  Scala  Julia
  • 99. Popular Deep Learning Libraries  TensorFlow  Pytorch  Caffe  Keras  Theano  Deep Learning Toolbox – Matlab  MatConvNet
  • 100. Popular Software / IDE for Deep Learning  PyCharm  Spyder  Jupyter Notebook  Matlab (Deep Learning toolbox) R2018b  Anaconda (contains Sypder, Jupyter Notebook, numpy, scipy, etc.)  Google CoLab (similar to Jupyter Notebook; gives K80 GPU for 12hrs at a time.)
  • 101. Resources  http://cs231n.github.io/convolutional-networks/  https://www.coursera.org/learn/convolutional-neural-networks  https://itunes.apple.com/us/app/flower/id1279174518?mt=8  https://fossbytes.com/popular-top-programming-languages-machine- learning-data-science/