Deep Learning
Presented By:
Srishty Saha
IIIT-DELHI
Shallow Learning
• SVM
• Linear & Kernel Regression
• Hidden Markov Models (HMM)
• Gaussian Mixture Models (GMM)
• Single hidden layer MLP
Limitations
Cannot make use of unlabeled data
Supervised vs Unsupervised Learning
• Supervised Learning
1.Output has to be produced according to target vector.
2.Input + Target vector = Training Pair
3.Labelled Data
• Unsupervised Learning ( self Organising)
1.Network receives input patterns to form clusters.
2.When a new input pattern is applied , output gives the class the input pattern
belongs to
3.Unlabelled Data
Neural Networks
• Machine Learning
• Knowledge from high dimensional data
• Classification
• Input: features of data
• supervised vs unsupervised
• labeled data
• Neurons
What is it used for?
• Classification
• Regression
---- Prediction
---- Fitting Curve
Multi Layer Perceptron
• Multiple Layers
• Feed Forward
• Connected Weights
• 1-of-N Output
hidden
output
Back Propagation
• Minimize error of
calculated output
• Adjust weights
• Gradient Descent
• Procedure
• Forward Phase
• Backpropagation
of errors
• For each sample,
multiple epochs
Problems with Backpropagation
• Multiple hidden Layers
• Get stuck in local optima
• start weights from random positions
• Only use labeled data
• most data is unlabeled
Deep Learning Means Feature Learning
• Deep Learning is about Learning Hierarchical Features.
Convolutional Neural Network
Feature extraction layer
Convolution layer
Shift and distortion invariance or
Subsampling layer
CNN contd.
• Detect the same feature at different positions in the
input image in C Layer.
features
CNN Contd.
Shared weights: all neurons in a feature share the
same weights (but not the biases).
In this way all neurons detect the same feature at
different positions in the input image.
Reduce the number of free parameters.
If a neuron in the feature map fires, this corresponds to a match with
the template
CNN Contd.
S Layer
The subsampling layers reduce the spatial resolution of each feature
map
By reducing the spatial resolution of the feature map, a certain degree
of shift and distortion invariance is achieved
Contd. S layer
The weight sharing is also applied in subsampling layers
Reduce the effect of noises and shift or distortion.
Applications
Speech Recognition.
Object Detection ( Computer Vision).
Web search – Text Analysis.
Few Insights Gathered From Papers.
• Used CBIR method to do feature extractions in Convolutional Layer.
• Applied filters to feature extraction.
• Used Definite size of patch to work upon.
• CNN method was used throughout.
• 3D convolution – time added as third factor .
• Feature Extraction so far observed was:
1. Gradient Filter in X and Y directions.
Object Detection
Architecture:
Dataset : MIT Face dataset 1104 faces.
Training – 200 images.
Test - 200 images.
The Convolutional Neural Network consists of two parts
1) the convolution layers and max-pooling layers
2) the fully connection layers and the output layers.
The Input Layer consists of 72x72 size histogram equalized images an
output is the set of different face images each of size 18x18.
The networks used for face detection and face recognition contains two
convolutional layer and two sub-sampling layer.
Output Layer
72 x 72 72x72 :5no 36x36: 5no
36X36
18 x 18:
12no
Output Layer
20 faces
Feature Maps :5no Feature Maps :5no
Input Layer Conv Layer-1
Kernels of 3x3
Conv Layer-2Samp Layer-1 Samp Layer-2
Convolutional Layer
Total of 5 kernels of size 3x3 is used to convolutional operation.
5 different feature maps :
• gray
• gradient –x
• gradient-y
• last two kernels gives the information below the eyes area.
Sampling Layer
• Mean filter of 2x2 is applied on image
• Alternate rows and alternate columns of image is sampled out.
72 x 72 72x72 :5no 36x36: 5no
36X36
18 x 18:
12no
Fully
connected
layer20 faces
Feature Maps :5no Feature Maps :5no
Input Layer Conv Layer-1
Kernels of 3x3
Conv Layer-2Samp Layer-1 Samp Layer-2
Fully Connected And Output Layer
• Output layer : 70 images.
72 x 72 72x72 :5no 36x36: 5no
36X36
18 x 18:
12no
Fully
connected
layer20 faces
Feature Maps :5no Feature Maps :5no
Input Layer Conv Layer-1
Kernels of 3x3
Conv Layer-2Samp Layer-1 Samp Layer-2
Error Propagation.
• Error Matrix e is obtained by finding difference between values of
neurons in output layer and fully connected layer.
• As there are 5 kernels in convolutional layers so each face will have 5
different feature maps. So in fully connected layer of Object
Recognition CNN,total neurons are 18 X 18 X 5(feature maps).
• So, Mean error M(i=1:5) of each Map is calculated.
• EM is Mean of M(i=1:5) is calculated.
• Error {M( i=1 :5) } is used for back-propagation.
• {EM} is used as a threshold such that any value below the threshold
is considered success or face match.
Implementation
1) Input of 200 images each of size 72X72 is presented to network one
by one for training.
2) In First Layer, Convolutional operation is performed using
aforementioned kernels of size 3X3 .The resultant output is of size
7 2 X 72 X 5.
3) In first S Layer, Sampling of image using mean filter of size 2X2
and sampling alternate rows and columns.The output of this of size
36 X 36 X 5.
4) In Second C layer,after convolution operation we get output
of size 36 X 36 X 5 .
5) In Second S layer, after sampling operation we get image of size
200 X 18 X 18 X 5.
6) The Fully connected layer is obtained after S layer and it is of
size 18 X 18 X 5 and each of neuron is connected to output layer.
7) Error Propagated using above method mentioned in
Back Propagation section of Object Recognition.
8) Error propagation takes place for fixed number of epochs in during
trainning.
9) For testing, {EM} obtained in equation is used as threshold to find face
match.
Results – Error vs Epochs
Results – Accuracy vs epochs
Object Detection.
• Input – Face images of size 72 x 72
Non Face images of size 72 x 72
• Output – 1 or 0
Error Propagation
• Error Matrix e is obtained by finding difference between values of
neurons in output layer and fully connected layer.
• Error of each neuron is propagated backwards and thus weight up-
dation is done.
• The backpropagation comes to hault when error <0.0003 or number
of epochs is 64 for training.
Object Detection ( Face/Non face)
1) Input of 50 (30 faces +20 non face) images each of size 72X72 is
presented to network one by one for training.
2) In First Layer, Convolutional operation is performed using
aforementioned kernels of size 3X3 .The resultant output is of size
7 2 X 72 X 5.
3) In first S Layer, Sampling of image using mean filter of size 2X2
and sampling alternate rows and columns.The output of this of size
36 X 36 X 5.
4) In Second C layer,after convolution operation we get output
of size 36 X 36 X 5 .
5) In Second S layer, after sampling operation we get image of size
18 X 18 X 5.
6) The Fully connected layer is obtained after S layer and it is of
size 18 X 18 X 5 and each of neuron is connected to output layer.
7) Error Propagated using above method mentioned in
Back Propagation section of Object Recognition.
8) Error propagation takes place for fixed number of epochs in during
trainning.
9) For testing, 200 images were used 125 faces and 75 non face.
Result:
Confusion Matrix Face Non face
Face (test) 100 25
Non Face (test) 32 43
Accuracy : 80% approx. for faces.
Accuracy : 57.33 % approx. for non faces.
Time to detect a face : 25.35 secs approx.
Object Recognition in an image.
Implementation iterative search
1) Input is an image is presented to network .
2) 72 X 72 patch is created and presented to network.
3) In First Layer, Convolutional operation is performed using
aforementioned kernels of size 3X3 .The resultant output is of size
7 2 X 72 X 5.
4) In first S Layer, Sampling of image using mean filter of size 2X2
and sampling alternate rows and columns.The output of this of size
36 X 36 X 5.
4) In Second C layer,after convolution operation we get output
of size 36 X 36 X 5 .
5) In Second S layer, after sampling operation we get image of size
200 X 18 X 18 X 5.
6) The Fully connected layer is obtained after S layer and it is of
size 18 X 18 X 5 and each of neuron is connected to output layer.
7) Error Propagated using above method mentioned in
Back Propagation section of Object Recognition.
8) Error propagation takes place for fixed number of epochs in during
trainning.
9) As an image is found Count = count+1.
Results: Error vs ith iteration face found.
Result : Accuracy
To do.
• Face detection using in set of images.
• Improve the accuracy.
• Implement it for videos.

Deep learning and its application

  • 1.
  • 2.
    Shallow Learning • SVM •Linear & Kernel Regression • Hidden Markov Models (HMM) • Gaussian Mixture Models (GMM) • Single hidden layer MLP Limitations Cannot make use of unlabeled data
  • 3.
    Supervised vs UnsupervisedLearning • Supervised Learning 1.Output has to be produced according to target vector. 2.Input + Target vector = Training Pair 3.Labelled Data • Unsupervised Learning ( self Organising) 1.Network receives input patterns to form clusters. 2.When a new input pattern is applied , output gives the class the input pattern belongs to 3.Unlabelled Data
  • 4.
    Neural Networks • MachineLearning • Knowledge from high dimensional data • Classification • Input: features of data • supervised vs unsupervised • labeled data • Neurons
  • 5.
    What is itused for? • Classification • Regression ---- Prediction ---- Fitting Curve
  • 6.
    Multi Layer Perceptron •Multiple Layers • Feed Forward • Connected Weights • 1-of-N Output hidden output
  • 7.
    Back Propagation • Minimizeerror of calculated output • Adjust weights • Gradient Descent • Procedure • Forward Phase • Backpropagation of errors • For each sample, multiple epochs
  • 8.
    Problems with Backpropagation •Multiple hidden Layers • Get stuck in local optima • start weights from random positions • Only use labeled data • most data is unlabeled
  • 9.
    Deep Learning MeansFeature Learning • Deep Learning is about Learning Hierarchical Features.
  • 10.
    Convolutional Neural Network Featureextraction layer Convolution layer Shift and distortion invariance or Subsampling layer
  • 11.
    CNN contd. • Detectthe same feature at different positions in the input image in C Layer. features
  • 12.
    CNN Contd. Shared weights:all neurons in a feature share the same weights (but not the biases). In this way all neurons detect the same feature at different positions in the input image. Reduce the number of free parameters. If a neuron in the feature map fires, this corresponds to a match with the template
  • 13.
    CNN Contd. S Layer Thesubsampling layers reduce the spatial resolution of each feature map By reducing the spatial resolution of the feature map, a certain degree of shift and distortion invariance is achieved
  • 14.
    Contd. S layer Theweight sharing is also applied in subsampling layers Reduce the effect of noises and shift or distortion.
  • 15.
    Applications Speech Recognition. Object Detection( Computer Vision). Web search – Text Analysis.
  • 16.
    Few Insights GatheredFrom Papers. • Used CBIR method to do feature extractions in Convolutional Layer. • Applied filters to feature extraction. • Used Definite size of patch to work upon. • CNN method was used throughout. • 3D convolution – time added as third factor . • Feature Extraction so far observed was: 1. Gradient Filter in X and Y directions.
  • 17.
    Object Detection Architecture: Dataset :MIT Face dataset 1104 faces. Training – 200 images. Test - 200 images. The Convolutional Neural Network consists of two parts 1) the convolution layers and max-pooling layers 2) the fully connection layers and the output layers. The Input Layer consists of 72x72 size histogram equalized images an output is the set of different face images each of size 18x18. The networks used for face detection and face recognition contains two convolutional layer and two sub-sampling layer.
  • 18.
    Output Layer 72 x72 72x72 :5no 36x36: 5no 36X36 18 x 18: 12no Output Layer 20 faces Feature Maps :5no Feature Maps :5no Input Layer Conv Layer-1 Kernels of 3x3 Conv Layer-2Samp Layer-1 Samp Layer-2
  • 19.
    Convolutional Layer Total of5 kernels of size 3x3 is used to convolutional operation. 5 different feature maps : • gray • gradient –x • gradient-y • last two kernels gives the information below the eyes area.
  • 20.
    Sampling Layer • Meanfilter of 2x2 is applied on image • Alternate rows and alternate columns of image is sampled out. 72 x 72 72x72 :5no 36x36: 5no 36X36 18 x 18: 12no Fully connected layer20 faces Feature Maps :5no Feature Maps :5no Input Layer Conv Layer-1 Kernels of 3x3 Conv Layer-2Samp Layer-1 Samp Layer-2
  • 21.
    Fully Connected AndOutput Layer • Output layer : 70 images. 72 x 72 72x72 :5no 36x36: 5no 36X36 18 x 18: 12no Fully connected layer20 faces Feature Maps :5no Feature Maps :5no Input Layer Conv Layer-1 Kernels of 3x3 Conv Layer-2Samp Layer-1 Samp Layer-2
  • 22.
    Error Propagation. • ErrorMatrix e is obtained by finding difference between values of neurons in output layer and fully connected layer. • As there are 5 kernels in convolutional layers so each face will have 5 different feature maps. So in fully connected layer of Object Recognition CNN,total neurons are 18 X 18 X 5(feature maps). • So, Mean error M(i=1:5) of each Map is calculated. • EM is Mean of M(i=1:5) is calculated. • Error {M( i=1 :5) } is used for back-propagation. • {EM} is used as a threshold such that any value below the threshold is considered success or face match.
  • 23.
    Implementation 1) Input of200 images each of size 72X72 is presented to network one by one for training. 2) In First Layer, Convolutional operation is performed using aforementioned kernels of size 3X3 .The resultant output is of size 7 2 X 72 X 5. 3) In first S Layer, Sampling of image using mean filter of size 2X2 and sampling alternate rows and columns.The output of this of size 36 X 36 X 5.
  • 24.
    4) In SecondC layer,after convolution operation we get output of size 36 X 36 X 5 . 5) In Second S layer, after sampling operation we get image of size 200 X 18 X 18 X 5. 6) The Fully connected layer is obtained after S layer and it is of size 18 X 18 X 5 and each of neuron is connected to output layer. 7) Error Propagated using above method mentioned in Back Propagation section of Object Recognition. 8) Error propagation takes place for fixed number of epochs in during trainning. 9) For testing, {EM} obtained in equation is used as threshold to find face match.
  • 25.
  • 26.
  • 27.
    Object Detection. • Input– Face images of size 72 x 72 Non Face images of size 72 x 72 • Output – 1 or 0
  • 28.
    Error Propagation • ErrorMatrix e is obtained by finding difference between values of neurons in output layer and fully connected layer. • Error of each neuron is propagated backwards and thus weight up- dation is done. • The backpropagation comes to hault when error <0.0003 or number of epochs is 64 for training.
  • 29.
    Object Detection (Face/Non face) 1) Input of 50 (30 faces +20 non face) images each of size 72X72 is presented to network one by one for training. 2) In First Layer, Convolutional operation is performed using aforementioned kernels of size 3X3 .The resultant output is of size 7 2 X 72 X 5. 3) In first S Layer, Sampling of image using mean filter of size 2X2 and sampling alternate rows and columns.The output of this of size 36 X 36 X 5.
  • 30.
    4) In SecondC layer,after convolution operation we get output of size 36 X 36 X 5 . 5) In Second S layer, after sampling operation we get image of size 18 X 18 X 5. 6) The Fully connected layer is obtained after S layer and it is of size 18 X 18 X 5 and each of neuron is connected to output layer. 7) Error Propagated using above method mentioned in Back Propagation section of Object Recognition. 8) Error propagation takes place for fixed number of epochs in during trainning. 9) For testing, 200 images were used 125 faces and 75 non face.
  • 31.
    Result: Confusion Matrix FaceNon face Face (test) 100 25 Non Face (test) 32 43 Accuracy : 80% approx. for faces. Accuracy : 57.33 % approx. for non faces. Time to detect a face : 25.35 secs approx.
  • 32.
  • 33.
    Implementation iterative search 1)Input is an image is presented to network . 2) 72 X 72 patch is created and presented to network. 3) In First Layer, Convolutional operation is performed using aforementioned kernels of size 3X3 .The resultant output is of size 7 2 X 72 X 5. 4) In first S Layer, Sampling of image using mean filter of size 2X2 and sampling alternate rows and columns.The output of this of size 36 X 36 X 5.
  • 34.
    4) In SecondC layer,after convolution operation we get output of size 36 X 36 X 5 . 5) In Second S layer, after sampling operation we get image of size 200 X 18 X 18 X 5. 6) The Fully connected layer is obtained after S layer and it is of size 18 X 18 X 5 and each of neuron is connected to output layer. 7) Error Propagated using above method mentioned in Back Propagation section of Object Recognition. 8) Error propagation takes place for fixed number of epochs in during trainning. 9) As an image is found Count = count+1.
  • 35.
    Results: Error vsith iteration face found.
  • 36.
  • 37.
    To do. • Facedetection using in set of images. • Improve the accuracy. • Implement it for videos.