This document discusses deep learning and convolutional neural networks. It provides an example of using a CNN for face detection and recognition. The CNN architecture includes convolution and subsampling layers to extract features from images. Backpropagation is used to minimize error and adjust weights. The example detects faces in images with 80% accuracy for faces and 57% for non-faces. Iterative search with a CNN is also used for object recognition in full images.
2. Shallow Learning
• SVM
• Linear & Kernel Regression
• Hidden Markov Models (HMM)
• Gaussian Mixture Models (GMM)
• Single hidden layer MLP
Limitations
Cannot make use of unlabeled data
3. Supervised vs Unsupervised Learning
• Supervised Learning
1.Output has to be produced according to target vector.
2.Input + Target vector = Training Pair
3.Labelled Data
• Unsupervised Learning ( self Organising)
1.Network receives input patterns to form clusters.
2.When a new input pattern is applied , output gives the class the input pattern
belongs to
3.Unlabelled Data
4. Neural Networks
• Machine Learning
• Knowledge from high dimensional data
• Classification
• Input: features of data
• supervised vs unsupervised
• labeled data
• Neurons
5. What is it used for?
• Classification
• Regression
---- Prediction
---- Fitting Curve
7. Back Propagation
• Minimize error of
calculated output
• Adjust weights
• Gradient Descent
• Procedure
• Forward Phase
• Backpropagation
of errors
• For each sample,
multiple epochs
8. Problems with Backpropagation
• Multiple hidden Layers
• Get stuck in local optima
• start weights from random positions
• Only use labeled data
• most data is unlabeled
9. Deep Learning Means Feature Learning
• Deep Learning is about Learning Hierarchical Features.
11. CNN contd.
• Detect the same feature at different positions in the
input image in C Layer.
features
12. CNN Contd.
Shared weights: all neurons in a feature share the
same weights (but not the biases).
In this way all neurons detect the same feature at
different positions in the input image.
Reduce the number of free parameters.
If a neuron in the feature map fires, this corresponds to a match with
the template
13. CNN Contd.
S Layer
The subsampling layers reduce the spatial resolution of each feature
map
By reducing the spatial resolution of the feature map, a certain degree
of shift and distortion invariance is achieved
14. Contd. S layer
The weight sharing is also applied in subsampling layers
Reduce the effect of noises and shift or distortion.
16. Few Insights Gathered From Papers.
• Used CBIR method to do feature extractions in Convolutional Layer.
• Applied filters to feature extraction.
• Used Definite size of patch to work upon.
• CNN method was used throughout.
• 3D convolution – time added as third factor .
• Feature Extraction so far observed was:
1. Gradient Filter in X and Y directions.
17. Object Detection
Architecture:
Dataset : MIT Face dataset 1104 faces.
Training – 200 images.
Test - 200 images.
The Convolutional Neural Network consists of two parts
1) the convolution layers and max-pooling layers
2) the fully connection layers and the output layers.
The Input Layer consists of 72x72 size histogram equalized images an
output is the set of different face images each of size 18x18.
The networks used for face detection and face recognition contains two
convolutional layer and two sub-sampling layer.
19. Convolutional Layer
Total of 5 kernels of size 3x3 is used to convolutional operation.
5 different feature maps :
• gray
• gradient –x
• gradient-y
• last two kernels gives the information below the eyes area.
20. Sampling Layer
• Mean filter of 2x2 is applied on image
• Alternate rows and alternate columns of image is sampled out.
72 x 72 72x72 :5no 36x36: 5no
36X36
18 x 18:
12no
Fully
connected
layer20 faces
Feature Maps :5no Feature Maps :5no
Input Layer Conv Layer-1
Kernels of 3x3
Conv Layer-2Samp Layer-1 Samp Layer-2
22. Error Propagation.
• Error Matrix e is obtained by finding difference between values of
neurons in output layer and fully connected layer.
• As there are 5 kernels in convolutional layers so each face will have 5
different feature maps. So in fully connected layer of Object
Recognition CNN,total neurons are 18 X 18 X 5(feature maps).
• So, Mean error M(i=1:5) of each Map is calculated.
• EM is Mean of M(i=1:5) is calculated.
• Error {M( i=1 :5) } is used for back-propagation.
• {EM} is used as a threshold such that any value below the threshold
is considered success or face match.
23. Implementation
1) Input of 200 images each of size 72X72 is presented to network one
by one for training.
2) In First Layer, Convolutional operation is performed using
aforementioned kernels of size 3X3 .The resultant output is of size
7 2 X 72 X 5.
3) In first S Layer, Sampling of image using mean filter of size 2X2
and sampling alternate rows and columns.The output of this of size
36 X 36 X 5.
24. 4) In Second C layer,after convolution operation we get output
of size 36 X 36 X 5 .
5) In Second S layer, after sampling operation we get image of size
200 X 18 X 18 X 5.
6) The Fully connected layer is obtained after S layer and it is of
size 18 X 18 X 5 and each of neuron is connected to output layer.
7) Error Propagated using above method mentioned in
Back Propagation section of Object Recognition.
8) Error propagation takes place for fixed number of epochs in during
trainning.
9) For testing, {EM} obtained in equation is used as threshold to find face
match.
27. Object Detection.
• Input – Face images of size 72 x 72
Non Face images of size 72 x 72
• Output – 1 or 0
28. Error Propagation
• Error Matrix e is obtained by finding difference between values of
neurons in output layer and fully connected layer.
• Error of each neuron is propagated backwards and thus weight up-
dation is done.
• The backpropagation comes to hault when error <0.0003 or number
of epochs is 64 for training.
29. Object Detection ( Face/Non face)
1) Input of 50 (30 faces +20 non face) images each of size 72X72 is
presented to network one by one for training.
2) In First Layer, Convolutional operation is performed using
aforementioned kernels of size 3X3 .The resultant output is of size
7 2 X 72 X 5.
3) In first S Layer, Sampling of image using mean filter of size 2X2
and sampling alternate rows and columns.The output of this of size
36 X 36 X 5.
30. 4) In Second C layer,after convolution operation we get output
of size 36 X 36 X 5 .
5) In Second S layer, after sampling operation we get image of size
18 X 18 X 5.
6) The Fully connected layer is obtained after S layer and it is of
size 18 X 18 X 5 and each of neuron is connected to output layer.
7) Error Propagated using above method mentioned in
Back Propagation section of Object Recognition.
8) Error propagation takes place for fixed number of epochs in during
trainning.
9) For testing, 200 images were used 125 faces and 75 non face.
31. Result:
Confusion Matrix Face Non face
Face (test) 100 25
Non Face (test) 32 43
Accuracy : 80% approx. for faces.
Accuracy : 57.33 % approx. for non faces.
Time to detect a face : 25.35 secs approx.
33. Implementation iterative search
1) Input is an image is presented to network .
2) 72 X 72 patch is created and presented to network.
3) In First Layer, Convolutional operation is performed using
aforementioned kernels of size 3X3 .The resultant output is of size
7 2 X 72 X 5.
4) In first S Layer, Sampling of image using mean filter of size 2X2
and sampling alternate rows and columns.The output of this of size
36 X 36 X 5.
34. 4) In Second C layer,after convolution operation we get output
of size 36 X 36 X 5 .
5) In Second S layer, after sampling operation we get image of size
200 X 18 X 18 X 5.
6) The Fully connected layer is obtained after S layer and it is of
size 18 X 18 X 5 and each of neuron is connected to output layer.
7) Error Propagated using above method mentioned in
Back Propagation section of Object Recognition.
8) Error propagation takes place for fixed number of epochs in during
trainning.
9) As an image is found Count = count+1.