GRADIENT BASED LEARNING
APPLIED TO DOCUMENT
RECOGNITION
Submitted by- Gyanendra Awasthi
(Roll no. -201315)
INSTRUCTOR- Prof. Nishchal K. Verma
Intoduction
 Authored by-
• Yann LeCun
• Leon Botteou
• Yoshua Bengio
• Patrick Haffiner
 Presented in the proceedings of the IEEE November 1998
 Yann LeCun and other co-authors introduced LeNet-5 architecture
through this paper
 Main message of the paper is building better pattern recognition system
relying more on automatic learning than hand–design heuristic
Introduction
 Traditional Approach
• Feature Extraction Module
• Trainable Classifier Module
 Limitations to this approach
• Feature extraction is hand crafted
• Classifier use low dimensional spaces in learning
techniques
 Enabling of high dimensional spaces due to
• Availability of low cost machines relying on ‘numerical
methods’
• Availability of large database
• Availability of powerful machine learning techniques
Introduction
 Why Gradient based Learning method
• Easier to minimize a smooth continuous function than
discrete function
• Minimal processing requirements
 Gradient Back- Propagation
• This with sigmoidal units can solve complicated learning
tasks if applied to multi layer neural network
Convolution Neural Network(CNN)
 CNN is standard form of neural network architecture associated
with image recognition
 CNN architectures are most popular deep learning framework
 Its applications are in marketing, healthcare, retail, automotive
 Characteristics of CNN architectures are-
• Local Receptive Fields
• Sub-sampling
• Weight Sharing
Fig. A typical architecture of CNN
LeNet-5 Convolution Neural Network
 LeNet-5 Architecture comprises of 7 layers(not counting the input
layer)
• 3 Convolutional Layers (Cx)
• 2 Subsampling Layers (Sx)
• 2 Fully Connected Layers (Fx)
where x denotes the layer’s index.
LeNet-5 Architecture
 First Layer (C1)
• Convolutional layer with 6 feature map of size 28*28
• Each unit in each feature map is connected to a 5*5 neighborhood in the input
• Contains 156 trainable parameters and 122,304 connections
 Second Layer (S2)
• A subsampling layer with 6 feature map of size 14*14
• Each unit in each feature map is connected to a 2*2 neighborhood in the
corresponding feature map in C1
• Contains 12 trainable parameters and 5880 connections
 Third Layer (C3)
• A Convolutional layer with 16 feature map of size 28*28
• Each unit in each feature map is connected to a 5*5 neighborhoods at identical
locations in S2 feature map
LeNet-5 Architecture
 Fourth Layer (S4)
• A subsampling layer with 16 feature map of size 5*5
• Each unit in each feature map is connected to a 2*2
neighborhood in the corresponding feature map in C3
• Contains 32 trainable parameters and 2000 connections
 Fifth Layer (C5)
• A convolutional layer with 120 feature map of size 1*1
• Each unit in each feature map is connected to a 5*5
neighborhood on all 16 of S4’s features
• Has 48120 trainable connections
 Sixth Layer (F6)
• A fully connected layer contains 84 units and fully connected to
C5
• Has 10164 trainable parameters
LeNet-5 Architecture
LeNet-1 Architecture
 Consists of
 3 convolutional layers
 2 Subsampling layers
 Number of parameters is about 3000
 The architecture is as such
 28×28 input image
 Four 24×24 feature maps
convolutional layer (5×5 size)
 Average Pooling layers (2×2 size)
 Eight 12×12 feature maps
convolutional layer (5×5 size)
 Average Pooling layers (2×2 size)
 Directly fully connected to the output
Fig. LeNet-1 Architecture
LeNet-4 Architecture
 Consists of :
 3 convolutional layers
 2 Subsampling layers
 1 Full connection layers
 Contains about 260,000 connections and 17,000 free parameters
 In LeNet-4, the input is 32*32 input layer in which 20*20 images (not
deslanted ) were centred by centre of mass.
Database- Modified NIST Dataset
 NIST- National Institute of Standard and Technology
database
 MNIST database
• Consists of handwritten images of digits from 0
to 9
• Subset of famous NIST Dataset
 Images centered in 28*28 pixels in black and white
 Dataset of 70,000 images of which 60,000 are for
training and remaining 10,000 for test set
MNIST Dataset
Training Set,
60000
Test set,
10000
Fig. MNIST Dataset
Some Training Images
Results: LeNet-1
• MNIST database is used in which
10% dataset is used for validation
and remaining for training.
• Test loss: 0.05995325744152069
• Test accuracy:
0.9811999797821045
• As number of epochs increases,
training loss and validation loss
decreases. Also training accuracy
and validation accuracy increases
with number of epochs.
Results: LeNet-4
• Test loss: 0.052783720195293427
• Test accuracy:
0.9832000136375427
• As number of epochs increases,
training loss and validation loss
decreases. Also training accuracy
and validation accuracy increases
with number of epochs.
• The test accuracy of LeNet-4 is
more than LeNet-1. Also, test loss
is less than of LeNet-1.
Results: LeNet-5
• Test loss: 0.038686320185661316
• Test accuracy:
0.9866999983787537
• As number of epochs increases,
training loss and validation loss
decreases. Also training accuracy
and validation accuracy increases
with number of epochs.
• The test accuracy of LeNet-5 is
more than both of LeNet-4 and
LeNet-1. Also, test loss is lesser
than both of LeNet-1and LeNet-4.
Results:
Comparison of various classifiers on MNIST
Dataset
Gradient Based Learning Applied to Document Recognition

Gradient Based Learning Applied to Document Recognition

  • 1.
    GRADIENT BASED LEARNING APPLIEDTO DOCUMENT RECOGNITION Submitted by- Gyanendra Awasthi (Roll no. -201315) INSTRUCTOR- Prof. Nishchal K. Verma
  • 2.
    Intoduction  Authored by- •Yann LeCun • Leon Botteou • Yoshua Bengio • Patrick Haffiner  Presented in the proceedings of the IEEE November 1998  Yann LeCun and other co-authors introduced LeNet-5 architecture through this paper  Main message of the paper is building better pattern recognition system relying more on automatic learning than hand–design heuristic
  • 3.
    Introduction  Traditional Approach •Feature Extraction Module • Trainable Classifier Module  Limitations to this approach • Feature extraction is hand crafted • Classifier use low dimensional spaces in learning techniques  Enabling of high dimensional spaces due to • Availability of low cost machines relying on ‘numerical methods’ • Availability of large database • Availability of powerful machine learning techniques
  • 4.
    Introduction  Why Gradientbased Learning method • Easier to minimize a smooth continuous function than discrete function • Minimal processing requirements  Gradient Back- Propagation • This with sigmoidal units can solve complicated learning tasks if applied to multi layer neural network
  • 5.
    Convolution Neural Network(CNN) CNN is standard form of neural network architecture associated with image recognition  CNN architectures are most popular deep learning framework  Its applications are in marketing, healthcare, retail, automotive  Characteristics of CNN architectures are- • Local Receptive Fields • Sub-sampling • Weight Sharing Fig. A typical architecture of CNN
  • 6.
    LeNet-5 Convolution NeuralNetwork  LeNet-5 Architecture comprises of 7 layers(not counting the input layer) • 3 Convolutional Layers (Cx) • 2 Subsampling Layers (Sx) • 2 Fully Connected Layers (Fx) where x denotes the layer’s index.
  • 7.
    LeNet-5 Architecture  FirstLayer (C1) • Convolutional layer with 6 feature map of size 28*28 • Each unit in each feature map is connected to a 5*5 neighborhood in the input • Contains 156 trainable parameters and 122,304 connections  Second Layer (S2) • A subsampling layer with 6 feature map of size 14*14 • Each unit in each feature map is connected to a 2*2 neighborhood in the corresponding feature map in C1 • Contains 12 trainable parameters and 5880 connections  Third Layer (C3) • A Convolutional layer with 16 feature map of size 28*28 • Each unit in each feature map is connected to a 5*5 neighborhoods at identical locations in S2 feature map
  • 8.
    LeNet-5 Architecture  FourthLayer (S4) • A subsampling layer with 16 feature map of size 5*5 • Each unit in each feature map is connected to a 2*2 neighborhood in the corresponding feature map in C3 • Contains 32 trainable parameters and 2000 connections  Fifth Layer (C5) • A convolutional layer with 120 feature map of size 1*1 • Each unit in each feature map is connected to a 5*5 neighborhood on all 16 of S4’s features • Has 48120 trainable connections  Sixth Layer (F6) • A fully connected layer contains 84 units and fully connected to C5 • Has 10164 trainable parameters
  • 9.
  • 10.
    LeNet-1 Architecture  Consistsof  3 convolutional layers  2 Subsampling layers  Number of parameters is about 3000  The architecture is as such  28×28 input image  Four 24×24 feature maps convolutional layer (5×5 size)  Average Pooling layers (2×2 size)  Eight 12×12 feature maps convolutional layer (5×5 size)  Average Pooling layers (2×2 size)  Directly fully connected to the output Fig. LeNet-1 Architecture
  • 11.
    LeNet-4 Architecture  Consistsof :  3 convolutional layers  2 Subsampling layers  1 Full connection layers  Contains about 260,000 connections and 17,000 free parameters  In LeNet-4, the input is 32*32 input layer in which 20*20 images (not deslanted ) were centred by centre of mass.
  • 12.
    Database- Modified NISTDataset  NIST- National Institute of Standard and Technology database  MNIST database • Consists of handwritten images of digits from 0 to 9 • Subset of famous NIST Dataset  Images centered in 28*28 pixels in black and white  Dataset of 70,000 images of which 60,000 are for training and remaining 10,000 for test set MNIST Dataset Training Set, 60000 Test set, 10000 Fig. MNIST Dataset
  • 13.
  • 14.
    Results: LeNet-1 • MNISTdatabase is used in which 10% dataset is used for validation and remaining for training. • Test loss: 0.05995325744152069 • Test accuracy: 0.9811999797821045 • As number of epochs increases, training loss and validation loss decreases. Also training accuracy and validation accuracy increases with number of epochs.
  • 15.
    Results: LeNet-4 • Testloss: 0.052783720195293427 • Test accuracy: 0.9832000136375427 • As number of epochs increases, training loss and validation loss decreases. Also training accuracy and validation accuracy increases with number of epochs. • The test accuracy of LeNet-4 is more than LeNet-1. Also, test loss is less than of LeNet-1.
  • 16.
    Results: LeNet-5 • Testloss: 0.038686320185661316 • Test accuracy: 0.9866999983787537 • As number of epochs increases, training loss and validation loss decreases. Also training accuracy and validation accuracy increases with number of epochs. • The test accuracy of LeNet-5 is more than both of LeNet-4 and LeNet-1. Also, test loss is lesser than both of LeNet-1and LeNet-4.
  • 17.
    Results: Comparison of variousclassifiers on MNIST Dataset