Convolutional Neural
Network (CNN)
1
In the name of God
Mehrnaz Faraz
Faculty of Electrical Engineering
K. N. Toosi University of Technology
Milad Abbasi
Faculty of Electrical Engineering
Sharif University of Technology
CNN
• A supervised deep learning algorithm
• Not fully connected neural network
• Suitable for big data and tensors
– Tensor: Multidimensional array
• Uses relatively little pre-processing compared to other
algorithms
2
Using CNN
• Computer vision
– Face recognition
– Scene labelling
– Image classification
– Action recognition
– Human pose estimation
– Document analysis
• Natural Language Processing
– Speech recognition
3
Using CNN
4
Face recognition Scene labelling
Human pose estimation
Document analysis
CNN Using
• Classification
• Object detection
• Segmentation
5
CNN
• Using convolutional layers
• Using pooling layers
• Using multiple filters in a layer
– Creates different outputs in a layer
• Suitable for image data
6
Convolutional Layer
• An example input volume in red (e.g. a 32x32x3 image)
– Color image: Height, Width, Depth (Channels)
– Each pixel has 3 channels (R,G and B)
Input image: 32x32x3
Filter: 5x5x3
7
32
32
3
Height
width
depth
5
5
3
Convolutional Layer
• Convolving input with a filter
– Convolution: Sum of element-wise multiplications
– Example:
8
Convolutional Layer
9
Convolutional Layer
10
Input (x)
Filter (w)
Feature Map
Stacked feature map with 10 different filters
A neuron
(number)
T
w x b
Convolutional Layer
• Stacked feature map:
11
Input
Filter
Filter
Feature Map
Convolutional Layer
• Convolutional layer is NOT fully connected
– Each neuron is connected only to a local region in the input
volume spatially
12
Convolutional Layer
• Increasing number of neurons Increasing parameters
and computational bourdon
• Parameter sharing
– Sharing of weights by all neurons in a particular feature map
– Reduces the number of parameters
• Local connectivity
– Each neural connected only to a subset of the input image
13
Number of Parameters
14
Input: 256x256x3
Parameters: 256*256*3+1=196,609 Parameters: 128*128*3+1=49,153
Kernel: 128x128x3
Parameter sharing
Stride
• Specifies how much we move the convolution filter at
each step
15
Stride
16
Padding
• The size of the feature map is smaller than the input
• To maintain the same dimensionality
– Using padding to surround the input with zero
17
Example
18
P=0, S=1
P=2, S=1
P=1, S=2
P=1, S=2
Example
• Size of feature map:
– i: size of input
– K: size of kernel
– p: padding
– s: stride
– o: size of feature map
19
2
1
i p k
o
s
  
   
Non-linearity
• Adds ReLU after each convolutional layer
• To introduce nonlinearity to a system that basically has just
been computing linear operations during the conv layers
• ReLU dose not saturate
20
Input Image
Feature Maps
Convolutional Layer/ Stacked feature map
Non-linearity
21
• Convolution + ReLU
Pooling Layer
• Or subsampling layer
• Periodically in-between Conv layers in a ConvNet
• Reduce the amount of parameters, size of data, and
computation in the network
• Control overfitting
• Types of pooling:
– Stride
– Mean pooling
– Max pooling
– Sum pooling
22
Pooling Layer
• Mean pooling
• Max pooling
23
With stride 2
CNN Overview
• CNNs have two components:
– The Hidden layers/Feature extraction part
• Perform a series of convolutions and pooling operations
• The convolution is performed on the input data with the
use of a filter or kernel to then produce a feature map
– The Classification part
• Assign a probability for the object on the image being
what the algorithm predicts it is
24
CNN Overview
25
CNN Example
26
Training
• Feed forward:
27
Training
• Back propagation:
28
Common Architectures in CNN
• Classic network architectures:
– LeNet-5
– AlexNet
– VGG16
• Modern network architectures:
– Inception (GoogLeNet)
– ResNet
– ResNeXt
– DenseNet
29
LeNet-5
– 7 layers
– 3 convolutional layers (C1, C3 and C5)
– 2 sub-sampling (pooling) layers (S2 and S4)/ mean pooling
– 1 fully connected layer (F6)
– 60,000 parameters
30
LeCun et al. in 1998
AlexNet
– The general architecture is quite similar to LeNet-5
– This model is considerably larger than LeNet-5
– Opening for computer vision tasks with deep learning
– 60 million parameters
31
Alex Krizhevsky et al. in 2012
VGG16
– Offers a deeper yet simpler variant of the convolutional
structures
– 138 million parameters
32
Introduced in 2014
GoogLeNet
– Comprised of a basic unit referred to as an "Inception
cell
33
In 2014, researchers at Google
Inception
34

Cnn

  • 1.
    Convolutional Neural Network (CNN) 1 Inthe name of God Mehrnaz Faraz Faculty of Electrical Engineering K. N. Toosi University of Technology Milad Abbasi Faculty of Electrical Engineering Sharif University of Technology
  • 2.
    CNN • A superviseddeep learning algorithm • Not fully connected neural network • Suitable for big data and tensors – Tensor: Multidimensional array • Uses relatively little pre-processing compared to other algorithms 2
  • 3.
    Using CNN • Computervision – Face recognition – Scene labelling – Image classification – Action recognition – Human pose estimation – Document analysis • Natural Language Processing – Speech recognition 3
  • 4.
    Using CNN 4 Face recognitionScene labelling Human pose estimation Document analysis
  • 5.
    CNN Using • Classification •Object detection • Segmentation 5
  • 6.
    CNN • Using convolutionallayers • Using pooling layers • Using multiple filters in a layer – Creates different outputs in a layer • Suitable for image data 6
  • 7.
    Convolutional Layer • Anexample input volume in red (e.g. a 32x32x3 image) – Color image: Height, Width, Depth (Channels) – Each pixel has 3 channels (R,G and B) Input image: 32x32x3 Filter: 5x5x3 7 32 32 3 Height width depth 5 5 3
  • 8.
    Convolutional Layer • Convolvinginput with a filter – Convolution: Sum of element-wise multiplications – Example: 8
  • 9.
  • 10.
    Convolutional Layer 10 Input (x) Filter(w) Feature Map Stacked feature map with 10 different filters A neuron (number) T w x b
  • 11.
    Convolutional Layer • Stackedfeature map: 11 Input Filter Filter Feature Map
  • 12.
    Convolutional Layer • Convolutionallayer is NOT fully connected – Each neuron is connected only to a local region in the input volume spatially 12
  • 13.
    Convolutional Layer • Increasingnumber of neurons Increasing parameters and computational bourdon • Parameter sharing – Sharing of weights by all neurons in a particular feature map – Reduces the number of parameters • Local connectivity – Each neural connected only to a subset of the input image 13
  • 14.
    Number of Parameters 14 Input:256x256x3 Parameters: 256*256*3+1=196,609 Parameters: 128*128*3+1=49,153 Kernel: 128x128x3 Parameter sharing
  • 15.
    Stride • Specifies howmuch we move the convolution filter at each step 15
  • 16.
  • 17.
    Padding • The sizeof the feature map is smaller than the input • To maintain the same dimensionality – Using padding to surround the input with zero 17
  • 18.
  • 19.
    Example • Size offeature map: – i: size of input – K: size of kernel – p: padding – s: stride – o: size of feature map 19 2 1 i p k o s       
  • 20.
    Non-linearity • Adds ReLUafter each convolutional layer • To introduce nonlinearity to a system that basically has just been computing linear operations during the conv layers • ReLU dose not saturate 20 Input Image Feature Maps Convolutional Layer/ Stacked feature map
  • 21.
  • 22.
    Pooling Layer • Orsubsampling layer • Periodically in-between Conv layers in a ConvNet • Reduce the amount of parameters, size of data, and computation in the network • Control overfitting • Types of pooling: – Stride – Mean pooling – Max pooling – Sum pooling 22
  • 23.
    Pooling Layer • Meanpooling • Max pooling 23 With stride 2
  • 24.
    CNN Overview • CNNshave two components: – The Hidden layers/Feature extraction part • Perform a series of convolutions and pooling operations • The convolution is performed on the input data with the use of a filter or kernel to then produce a feature map – The Classification part • Assign a probability for the object on the image being what the algorithm predicts it is 24
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    Common Architectures inCNN • Classic network architectures: – LeNet-5 – AlexNet – VGG16 • Modern network architectures: – Inception (GoogLeNet) – ResNet – ResNeXt – DenseNet 29
  • 30.
    LeNet-5 – 7 layers –3 convolutional layers (C1, C3 and C5) – 2 sub-sampling (pooling) layers (S2 and S4)/ mean pooling – 1 fully connected layer (F6) – 60,000 parameters 30 LeCun et al. in 1998
  • 31.
    AlexNet – The generalarchitecture is quite similar to LeNet-5 – This model is considerably larger than LeNet-5 – Opening for computer vision tasks with deep learning – 60 million parameters 31 Alex Krizhevsky et al. in 2012
  • 32.
    VGG16 – Offers adeeper yet simpler variant of the convolutional structures – 138 million parameters 32 Introduced in 2014
  • 33.
    GoogLeNet – Comprised ofa basic unit referred to as an "Inception cell 33 In 2014, researchers at Google
  • 34.