SlideShare a Scribd company logo
Dr. Anindya Halder
Associate Professor, Cotton University
• In the previous slides we learned the basics of Deep neural network and its types and
use cases.
• In this section we will learn one of its kind which is Convolutional Neural Network
(CNN) Architecture
What is CNN ?
• A Convolutional Neural Network, also known as CNN or ConvNet, is a type of feed-
forward neural networks that specializes in processing data that has a grid-like
topology, such as an image.
• A digital image is representation of visual data. It contains a series of pixels arranged
in a grid-like fashion that contains pixel values.
• Because of this kind of representation CNN is used for image classification.
• The architecture of CNN is designed to take
advantage of the 2D structure of an input
• The basic CNN is comprised of one or more
convolution layer (often with a pooling step) and
then followed by one or more fully connected
layers as in a standard multilayer neural
Motivation behind CNN ?
• Consider an image of size 200x200x3 (200 wide, 200 high, 3 color channels)
A single fully-connected neuron in a first hidden layer of a regular Neural Network would have
200x200x3 = 120000 weights
Due to the presence of several such neurons, this full connectivity is wasteful, and the huge
number of parameters would quickly lead to overfitting.
• However, in a CNN the neurons in a layer will only be connected to a small region of
the layer before it (will discuss later) instead of all the neurons in a fully connected
The final output layer would have dimensions 1x1xN, because by the end of the CNN
architecture we will reduce the full image into a single vector of class scores (for N classes),
arranged along the depth dimension.
MLP vs CNN ?
Multi-layered perceptron: all layers are fully
Convolutional Neural Network with partially
connected Convolution layer
MLP vs CNN ?
Multi-layered perceptron: a regular 3-layer
neural network
Convolutional Neural Network arranges its
neuron in 3 dimensions as visualized in
Because of this 3-D distribution of neurons CNN is intelligently adapted to the properties of images:
• Pixel position and neighborhood have semantic meanings
• Elements of interest can appear anywhere in the image
How CNN works – What computer sees
• For example, a CNN can take an image which can be classified a ‘X’ or ‘O’
• In simple case ‘X’ would look like
• But what about trickier case
• Since pattern does not match exactly, the computer will not be able to classify this as ‘X’.
Using CNN, we can overcome this issue by taking some measures.
CNN layers
• CNN consist of four basic layers
• Convolutional layer (CONV) will compute the output of neurons that are connected to local
regions in the input, each computing a dot product between their weights and a small region
they are connected to in the input volume.
• RELU (already discussed in ANN) layer will apply an elementwise activation function, such
as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged. Which
removes no-linearity from data.
• Pooling (POOL) layer will perform a down sampling operation along the spatial dimensions
(width, height). Sometimes we also use DROPOUT for down sampling.
• Fully-connected layer (FC) will compute the class scores, resulting in volume of size
[1x1xN], where each of the N numbers correspond to a class score, such as among the N
Convolutional Layer
• The convolution layer (CONV) uses filters that perform convolution operations as it is scanning
the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride
S. The resulting output O is called feature map or activation map.
• Convolution layer will work to identify patterns (features) instead of individual pixels.
• The role of the ConvNet is to reduce the images into a form which is easier to process, without
losing features which are critical for getting a good prediction.
What is Convolution
• Mathematically, convolution is the summation
of the element-wise product of 2 matrices (input
image and filter).
• Let us consider an image ‘X’ & a filter ‘Y’ (More
about filter will be covered later). Both X & Y,
are matrices (image X is being expressed in the
state of pixels). When we convolve the image
‘X’ using filter ‘Y’, we produce the output in a
matrix, say’ Z’.
• Finally, we compute the sum of all the elements
in ‘Z’ to get a scalar number
image X
kernel Y
Convolution operation
Convolutional Layer - Filters/Kernels
• A filter provides a measure for how close a patch or a region of the input resembles a feature. A
feature may be any prominent aspect – a vertical edge, a horizontal edge, an arch, a diagonal,
• A filter acts as a single template or pattern, which, when convolved across the input, finds
similarities between the stored template & different locations/regions in the input image.
• To perform convolution operation, slide the filter over the width and height of the input image
and perform summation of the element-wise product.
• If the input image size is ‘n x n’ & filter size is ‘f’
• Output size = (n – f + 1) x (n – f + 1)
• Output size = (5-3+1) x (5-3+1) = 3x3
Filter hyperparameters - Padding
• Sometimes it is convenient to pad the input volume with zeros around the border.
• Zero padding is allowed us to preserve the spatial size of the output volumes.
• Why do we do Padding?
• Every time we apply a convolution operator, our image shrinks. So, we lose a lot of
information because of image shrinking, which is one of the downsides of convolution.
• So, to fix these problems, we can ‘pad’ the image.
One bit Zero padding on a 5x5 image
• Let P be padding. In this example, p = 1
because we padded all around the input image
with an extra border of 1 pixel.
• Output Size = (n + 2p –f +1) x (n + 2p –f +1)
where, n is the image dimension, p is the
padding and f is the filter-size
Types of Padding
• There are two common choices for padding: Valid convolutions & the Same convolutions.
a) Valid convolutions - This Means no padding. Thus, in this case, we might have (nxn) image
convolve with (fxf) filter & this would give us an output (n-f+1) x (n-f+1) dimensional output.
b) Same convolutions - In this case, padding is such that the output size is the same as the
input image size. When we do padding by ‘p’ pixels then, size of the input image changes
from (nxn) to (n + 2p –f +1) x (n + 2p –f +1).
The amount of padding to be done should be such that the output image after convolution
matches the size of the input image.
Let, n x n = Original input image size, p = Padding
(n+2p) x (n+2p) = Size of padded input image
(n+2p–f+1) x (n+2p-f+1) = Size of output image after convolving padded image
To avoid shrinkage of the original input image, we calculate ‘p = padding size’.
So, we achieve Output size after convolving padded image = Original input image size
How is the Filter Size Decided?
• By convention, the value of ‘f,’ i.e., filter size, is usually odd in computer vision. This might be
because of 2 reasons:
• If the value of ‘f’ is even, we may need asymmetric padding (according the previous slide).
Let us say that the size of the filter i.e., ‘f’ is 6. Then by using equation of padding, we get a
padding size of 2.5, which does not make sense.
Let, nxn = 10 x 10 = Original input image size, p = Padding and f = 6
Output image = (10+2p–6+1) x (10+2p-6+1) = 10x10
because we want out output image same as input
and we get p=2.5 which is not make any sense
• The 2nd reason for choosing an odd size filter such as a 3×3 or a 5×5 filter is we get a central
position & at times it is nice to have a distinguisher.
Filter hyperparameters - Stride
• For a convolutional or a pooling operation, the stride S denotes the number of pixels by
which the window moves after each operation.
• In simple words the stride indicates the pace by which the filter moves horizontally &
vertically over the pixels of the input image during convolution.
• Let n x n = Original input image size, p = Padding, f = kernel and s = stride
Output image size = [{(n + 2p - f) / s} + 1] x [{(n + 2p - f) / s} + 1]
Convolution Operation with Stride Length = 2
Stride during convolution
Convolutions over RGB images
• Consider an RGB image of size 6×6. Since it’s an RGB image, its dimension is 6x6x3, where
the three corresponds to the three colors channels: Red, Green & Blue. We can imagine this
as a 3-D image with a stack of 3 six by six shots.
• For 3-D images, we need 3D filters, i.e., the filter itself will also have three layers
corresponding to the red, green & blue channels, like that of the input RGB image.
Convolution over volume
• We 1st place the 3x3x3 filter in the upper left
most position same as 2-D. This filter has 27 (9
parameters in each channel) or numbers.
• We take each of these 27 numbers & multiply
them with the corresponding numbers from the
image’s red, green & blue channels.
• Then we add up all those numbers & this gives
us the 1st number in the output image.
How Convolutions over RGB images works
Multiple Filters for Multiple Features
• We can use multiple filters to detect various features simultaneously.
• Let us consider the following example in which we see vertical edge & curve in the input RGB
• We will have to use two different filters for this task, and the output image will thus have two
feature maps.
Convolution using multiple filters
• Let us understand the dimensions mathematically
Some important concepts
• The filters are learned during training (i.e., during backpropagation). Hence, the individual
values of the filters are often called the weights of CNN.
• A neuron is a filter whose weights are learned during training. E.g., a (3,3,3) filter (or neuron)
has 27 units. Each neuron looks at a particular region in the output (i.e., its ‘receptive field’)
• A feature map is a collection of multiple neurons, each looking at different inputs with the
same weights.
• All neurons in a feature map extract the same feature (but from other input regions). It is
called a ‘feature map’ because it maps where a particular part is found in the image.
ReLU Layer
• ReLU is a piecewise linear function that will output the input
directly if it is positive, otherwise, it will output zero.
• The main catch here is that the ReLU function does not activate all
the neurons at the same time.
• Mathematically it can be represented as:
• The derivative of the function is:
Pooling Layer
• A pooling layer is another essential building block of CNN. It tries to figure out whether a
particular region in the image has the feature we are interested in or not.
• The pooling layer (POOL) is a down sampling operation, typically applied after a convolution
layer, which does some spatial invariance.
• The two most popular aggregate functions used in pooling are ‘max’ & ‘average’:
a) Max pooling – If any of the patches say something firmly about the presence of a particular feature,
then the pooling layer counts that feature as ‘detected’. It preserves detected features and mostly
b) Average pooling – If one patch says something very firmly, but the other ones disagree, the average
pooling takes the average to find out. It down samples feature map and used in LeNet.
Pooling Layer – Advantage and Disadvantage
• Advantages
• Pooling has the advantage of making the representation more compact by reducing the
spatial size of the feature maps, thereby reducing the number of parameters to be learnt.
• Pooling reduces only the height & width of the feature map, not the number of channels
• Disadvantage
• Pooling also loses a lot of information, which is often considered a potential disadvantage
Dropout Layer
• Large neural nets trained on relatively small datasets can overfit the training data which
results in poor performance when the model is evaluated on new data.
• Dropout is a regularization method that approximates training a large number of neural
networks with different architectures in parallel.
• During training, some number of layer outputs are randomly ignored or “dropped out.” in this
• Dropout has the effect of making the training
process noisy, forcing nodes within a layer to
probabilistically take on more or less responsibility
for the inputs.
Fully connected Layer
• Fully connected layers are the normal flat
feed-forward neural network layers.
• This layers may have some non-linear
activation function or mostly softmax
activation function in order to predict
• To compute the output, we basically
arrange all the output 2-D matrices as a
1-D array.
Fully connected Layer
• A summation of product of inputs and weights at each output node determines the final
prediction. Same as what we do during feed-forward network.
Understanding the complexity of the CNN
• In order to assess the complexity of a model, it is often useful to determine the number of
parameters that its architecture will have. In a given layer of a convolutional neural network, it
is done as follows:
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
How image recognition works with CNN ?
• Till now we have seen different components of CNN. Now let see how different component
work together in CNN to identify an image of a bird.
Different CNN Architectures
• There are various architectures of CNNs available which have been key in building
algorithms which power and shall power AI in the foreseeable future. Some of them have
been listed below:
• In this section we learn
• Basics of CNN
• How CNN is different from other ML algorithms
• Understand layers of CNN
• How CNN classify/recognize images
• Different CNN architectures
• In the next section we will learn Recurrent Neural network
Thank You

More Related Content

Similar to CNN_AH.pptx

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Jedha Bootcamp
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
Varun Bhaseen
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
Kv Sagar
DIP Notes Unit-1 PPT , engineering, computer Science
DIP Notes Unit-1 PPT , engineering, computer ScienceDIP Notes Unit-1 PPT , engineering, computer Science
DIP Notes Unit-1 PPT , engineering, computer Science
2. filtering basics
2. filtering basics2. filtering basics
2. filtering basics
Atul Kumar Jha
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Universitat Politècnica de Catalunya
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
DIP Notes Unit-1 PPT.pdf
DIP Notes Unit-1 PPT.pdfDIP Notes Unit-1 PPT.pdf
DIP Notes Unit-1 PPT.pdf
Gaurav Sharma
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
Andrew Ferlitsch
Lecture 4
Lecture 4Lecture 4
Lecture 4
Wael Sharba
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt
Image enhancement
Image enhancementImage enhancement
Image enhancementAyaelshiwi
DIP Lecture 7-9.pdf
DIP Lecture 7-9.pdfDIP Lecture 7-9.pdf
DIP Lecture 7-9.pdf
Deep Learning.pptx
Deep Learning.pptxDeep Learning.pptx
Deep Learning.pptx
vinayaga moorthy
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
Yu Huang

Similar to CNN_AH.pptx (20)

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
DIP Notes Unit-1 PPT , engineering, computer Science
DIP Notes Unit-1 PPT , engineering, computer ScienceDIP Notes Unit-1 PPT , engineering, computer Science
DIP Notes Unit-1 PPT , engineering, computer Science
2. filtering basics
2. filtering basics2. filtering basics
2. filtering basics
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
DIP Notes Unit-1 PPT.pdf
DIP Notes Unit-1 PPT.pdfDIP Notes Unit-1 PPT.pdf
DIP Notes Unit-1 PPT.pdf
Deep Learning
Deep LearningDeep Learning
Deep Learning
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
Lecture 4
Lecture 4Lecture 4
Lecture 4
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt
Image enhancement
Image enhancementImage enhancement
Image enhancement
DIP Lecture 7-9.pdf
DIP Lecture 7-9.pdfDIP Lecture 7-9.pdf
DIP Lecture 7-9.pdf
Deep Learning.pptx
Deep Learning.pptxDeep Learning.pptx
Deep Learning.pptx
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing

Recently uploaded

Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
Kamal Acharya
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant

Recently uploaded (20)

Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf


  • 1. INTRODUCTION TO CONVOLUTIONAL NEURAL NETWORK Dr. Anindya Halder Associate Professor, Cotton University
  • 2. Introduction • In the previous slides we learned the basics of Deep neural network and its types and use cases. • In this section we will learn one of its kind which is Convolutional Neural Network (CNN) Architecture
  • 3. What is CNN ? • A Convolutional Neural Network, also known as CNN or ConvNet, is a type of feed- forward neural networks that specializes in processing data that has a grid-like topology, such as an image. • A digital image is representation of visual data. It contains a series of pixels arranged in a grid-like fashion that contains pixel values. • Because of this kind of representation CNN is used for image classification. • The architecture of CNN is designed to take advantage of the 2D structure of an input image. • The basic CNN is comprised of one or more convolution layer (often with a pooling step) and then followed by one or more fully connected layers as in a standard multilayer neural network.
  • 4. Motivation behind CNN ? • Consider an image of size 200x200x3 (200 wide, 200 high, 3 color channels) A single fully-connected neuron in a first hidden layer of a regular Neural Network would have 200x200x3 = 120000 weights Due to the presence of several such neurons, this full connectivity is wasteful, and the huge number of parameters would quickly lead to overfitting. • However, in a CNN the neurons in a layer will only be connected to a small region of the layer before it (will discuss later) instead of all the neurons in a fully connected manner. The final output layer would have dimensions 1x1xN, because by the end of the CNN architecture we will reduce the full image into a single vector of class scores (for N classes), arranged along the depth dimension.
  • 5. MLP vs CNN ? Multi-layered perceptron: all layers are fully connected Convolutional Neural Network with partially connected Convolution layer
  • 6. MLP vs CNN ? Multi-layered perceptron: a regular 3-layer neural network Convolutional Neural Network arranges its neuron in 3 dimensions as visualized in figure. Because of this 3-D distribution of neurons CNN is intelligently adapted to the properties of images: • Pixel position and neighborhood have semantic meanings • Elements of interest can appear anywhere in the image
  • 7. How CNN works – What computer sees • For example, a CNN can take an image which can be classified a ‘X’ or ‘O’ • In simple case ‘X’ would look like • But what about trickier case • Since pattern does not match exactly, the computer will not be able to classify this as ‘X’. Using CNN, we can overcome this issue by taking some measures.
  • 8. CNN layers • CNN consist of four basic layers • Convolutional layer (CONV) will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. • RELU (already discussed in ANN) layer will apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged. Which removes no-linearity from data. • Pooling (POOL) layer will perform a down sampling operation along the spatial dimensions (width, height). Sometimes we also use DROPOUT for down sampling. • Fully-connected layer (FC) will compute the class scores, resulting in volume of size [1x1xN], where each of the N numbers correspond to a class score, such as among the N categories.
  • 9. Convolutional Layer • The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map. • Convolution layer will work to identify patterns (features) instead of individual pixels. • The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction.
  • 10. What is Convolution operation? • Mathematically, convolution is the summation of the element-wise product of 2 matrices (input image and filter). • Let us consider an image ‘X’ & a filter ‘Y’ (More about filter will be covered later). Both X & Y, are matrices (image X is being expressed in the state of pixels). When we convolve the image ‘X’ using filter ‘Y’, we produce the output in a matrix, say’ Z’. • Finally, we compute the sum of all the elements in ‘Z’ to get a scalar number image X kernel Y Convolution operation
  • 11. Convolutional Layer - Filters/Kernels • A filter provides a measure for how close a patch or a region of the input resembles a feature. A feature may be any prominent aspect – a vertical edge, a horizontal edge, an arch, a diagonal, etc. • A filter acts as a single template or pattern, which, when convolved across the input, finds similarities between the stored template & different locations/regions in the input image. • To perform convolution operation, slide the filter over the width and height of the input image and perform summation of the element-wise product. • If the input image size is ‘n x n’ & filter size is ‘f’ • Output size = (n – f + 1) x (n – f + 1) • Output size = (5-3+1) x (5-3+1) = 3x3
  • 12. Filter hyperparameters - Padding • Sometimes it is convenient to pad the input volume with zeros around the border. • Zero padding is allowed us to preserve the spatial size of the output volumes. • Why do we do Padding? • Every time we apply a convolution operator, our image shrinks. So, we lose a lot of information because of image shrinking, which is one of the downsides of convolution. • So, to fix these problems, we can ‘pad’ the image. One bit Zero padding on a 5x5 image • Let P be padding. In this example, p = 1 because we padded all around the input image with an extra border of 1 pixel. • Output Size = (n + 2p –f +1) x (n + 2p –f +1) where, n is the image dimension, p is the padding and f is the filter-size
  • 13. Types of Padding • There are two common choices for padding: Valid convolutions & the Same convolutions. a) Valid convolutions - This Means no padding. Thus, in this case, we might have (nxn) image convolve with (fxf) filter & this would give us an output (n-f+1) x (n-f+1) dimensional output. b) Same convolutions - In this case, padding is such that the output size is the same as the input image size. When we do padding by ‘p’ pixels then, size of the input image changes from (nxn) to (n + 2p –f +1) x (n + 2p –f +1). The amount of padding to be done should be such that the output image after convolution matches the size of the input image. Let, n x n = Original input image size, p = Padding (n+2p) x (n+2p) = Size of padded input image (n+2p–f+1) x (n+2p-f+1) = Size of output image after convolving padded image To avoid shrinkage of the original input image, we calculate ‘p = padding size’. So, we achieve Output size after convolving padded image = Original input image size
  • 14. How is the Filter Size Decided? • By convention, the value of ‘f,’ i.e., filter size, is usually odd in computer vision. This might be because of 2 reasons: • If the value of ‘f’ is even, we may need asymmetric padding (according the previous slide). Let us say that the size of the filter i.e., ‘f’ is 6. Then by using equation of padding, we get a padding size of 2.5, which does not make sense. Let, nxn = 10 x 10 = Original input image size, p = Padding and f = 6 Output image = (10+2p–6+1) x (10+2p-6+1) = 10x10 because we want out output image same as input and we get p=2.5 which is not make any sense • The 2nd reason for choosing an odd size filter such as a 3×3 or a 5×5 filter is we get a central position & at times it is nice to have a distinguisher.
  • 15. Filter hyperparameters - Stride • For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation. • In simple words the stride indicates the pace by which the filter moves horizontally & vertically over the pixels of the input image during convolution. • Let n x n = Original input image size, p = Padding, f = kernel and s = stride Output image size = [{(n + 2p - f) / s} + 1] x [{(n + 2p - f) / s} + 1] Convolution Operation with Stride Length = 2 Stride during convolution
  • 16. Convolutions over RGB images • Consider an RGB image of size 6×6. Since it’s an RGB image, its dimension is 6x6x3, where the three corresponds to the three colors channels: Red, Green & Blue. We can imagine this as a 3-D image with a stack of 3 six by six shots. • For 3-D images, we need 3D filters, i.e., the filter itself will also have three layers corresponding to the red, green & blue channels, like that of the input RGB image. Convolution over volume • We 1st place the 3x3x3 filter in the upper left most position same as 2-D. This filter has 27 (9 parameters in each channel) or numbers. • We take each of these 27 numbers & multiply them with the corresponding numbers from the image’s red, green & blue channels. • Then we add up all those numbers & this gives us the 1st number in the output image.
  • 17. How Convolutions over RGB images works
  • 18. Multiple Filters for Multiple Features • We can use multiple filters to detect various features simultaneously. • Let us consider the following example in which we see vertical edge & curve in the input RGB image. • We will have to use two different filters for this task, and the output image will thus have two feature maps. Convolution using multiple filters • Let us understand the dimensions mathematically
  • 19. Some important concepts • The filters are learned during training (i.e., during backpropagation). Hence, the individual values of the filters are often called the weights of CNN. • A neuron is a filter whose weights are learned during training. E.g., a (3,3,3) filter (or neuron) has 27 units. Each neuron looks at a particular region in the output (i.e., its ‘receptive field’) • A feature map is a collection of multiple neurons, each looking at different inputs with the same weights. • All neurons in a feature map extract the same feature (but from other input regions). It is called a ‘feature map’ because it maps where a particular part is found in the image.
  • 20. ReLU Layer • ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. • The main catch here is that the ReLU function does not activate all the neurons at the same time. • Mathematically it can be represented as: • The derivative of the function is:
  • 21. Pooling Layer • A pooling layer is another essential building block of CNN. It tries to figure out whether a particular region in the image has the feature we are interested in or not. • The pooling layer (POOL) is a down sampling operation, typically applied after a convolution layer, which does some spatial invariance. • The two most popular aggregate functions used in pooling are ‘max’ & ‘average’: a) Max pooling – If any of the patches say something firmly about the presence of a particular feature, then the pooling layer counts that feature as ‘detected’. It preserves detected features and mostly used. b) Average pooling – If one patch says something very firmly, but the other ones disagree, the average pooling takes the average to find out. It down samples feature map and used in LeNet.
  • 22. Pooling Layer – Advantage and Disadvantage • Advantages • Pooling has the advantage of making the representation more compact by reducing the spatial size of the feature maps, thereby reducing the number of parameters to be learnt. • Pooling reduces only the height & width of the feature map, not the number of channels • Disadvantage • Pooling also loses a lot of information, which is often considered a potential disadvantage
  • 23. Dropout Layer • Large neural nets trained on relatively small datasets can overfit the training data which results in poor performance when the model is evaluated on new data. • Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. • During training, some number of layer outputs are randomly ignored or “dropped out.” in this layer. • Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsibility for the inputs.
  • 24. Fully connected Layer • Fully connected layers are the normal flat feed-forward neural network layers. • This layers may have some non-linear activation function or mostly softmax activation function in order to predict classes. • To compute the output, we basically arrange all the output 2-D matrices as a 1-D array.
  • 25. Fully connected Layer • A summation of product of inputs and weights at each output node determines the final prediction. Same as what we do during feed-forward network.
  • 26. Understanding the complexity of the CNN • In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:
  • 27. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 28. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 29. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 30. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 31. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 32. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 33. How image recognition works with CNN ? • Till now we have seen different components of CNN. Now let see how different component work together in CNN to identify an image of a bird.
  • 34. Different CNN Architectures • There are various architectures of CNNs available which have been key in building algorithms which power and shall power AI in the foreseeable future. Some of them have been listed below:
  • 35. Summary • In this section we learn • Basics of CNN • How CNN is different from other ML algorithms • Understand layers of CNN • How CNN classify/recognize images • Different CNN architectures • In the next section we will learn Recurrent Neural network