Gradient Based Learning Applied to Document Recognition

GRADIENT BASED LEARNING
APPLIED TO DOCUMENT
RECOGNITION
Submitted by- Gyanendra Awasthi
(Roll no. -201315)
INSTRUCTOR- Prof. Nishchal K. Verma

Intoduction
 Authored by-
• Yann LeCun
• Leon Botteou
• Yoshua Bengio
• Patrick Haffiner
 Presented in the proceedings of the IEEE November 1998
 Yann LeCun and other co-authors introduced LeNet-5 architecture
through this paper
 Main message of the paper is building better pattern recognition system
relying more on automatic learning than hand–design heuristic

Introduction
 Traditional Approach
• Feature Extraction Module
• Trainable Classifier Module
 Limitations to this approach
• Feature extraction is hand crafted
• Classifier use low dimensional spaces in learning
techniques
 Enabling of high dimensional spaces due to
• Availability of low cost machines relying on ‘numerical
methods’
• Availability of large database
• Availability of powerful machine learning techniques

Introduction
 Why Gradient based Learning method
• Easier to minimize a smooth continuous function than
discrete function
• Minimal processing requirements
 Gradient Back- Propagation
• This with sigmoidal units can solve complicated learning
tasks if applied to multi layer neural network

Convolution Neural Network(CNN)
 CNN is standard form of neural network architecture associated
with image recognition
 CNN architectures are most popular deep learning framework
 Its applications are in marketing, healthcare, retail, automotive
 Characteristics of CNN architectures are-
• Local Receptive Fields
• Sub-sampling
• Weight Sharing
Fig. A typical architecture of CNN

LeNet-5 Convolution Neural Network
 LeNet-5 Architecture comprises of 7 layers(not counting the input
layer)
• 3 Convolutional Layers (Cx)
• 2 Subsampling Layers (Sx)
• 2 Fully Connected Layers (Fx)
where x denotes the layer’s index.

LeNet-5 Architecture
 First Layer (C1)
• Convolutional layer with 6 feature map of size 28*28
• Each unit in each feature map is connected to a 5*5 neighborhood in the input
• Contains 156 trainable parameters and 122,304 connections
 Second Layer (S2)
• A subsampling layer with 6 feature map of size 14*14
• Each unit in each feature map is connected to a 2*2 neighborhood in the
corresponding feature map in C1
• Contains 12 trainable parameters and 5880 connections
 Third Layer (C3)
• A Convolutional layer with 16 feature map of size 28*28
• Each unit in each feature map is connected to a 5*5 neighborhoods at identical
locations in S2 feature map

 Fourth Layer (S4)
• A subsampling layer with 16 feature map of size 5*5
• Each unit in each feature map is connected to a 2*2
neighborhood in the corresponding feature map in C3
• Contains 32 trainable parameters and 2000 connections
 Fifth Layer (C5)
• A convolutional layer with 120 feature map of size 1*1
• Each unit in each feature map is connected to a 5*5
neighborhood on all 16 of S4’s features
• Has 48120 trainable connections
 Sixth Layer (F6)
• A fully connected layer contains 84 units and fully connected to
C5
• Has 10164 trainable parameters

 Consists of
 3 convolutional layers
 2 Subsampling layers
 Number of parameters is about 3000
 The architecture is as such
 28×28 input image
 Four 24×24 feature maps
convolutional layer (5×5 size)
 Average Pooling layers (2×2 size)
 Eight 12×12 feature maps
convolutional layer (5×5 size)
 Average Pooling layers (2×2 size)
 Directly fully connected to the output
Fig. LeNet-1 Architecture

 Consists of :
 3 convolutional layers
 2 Subsampling layers
 1 Full connection layers
 Contains about 260,000 connections and 17,000 free parameters
 In LeNet-4, the input is 32*32 input layer in which 20*20 images (not
deslanted ) were centred by centre of mass.

Database- Modified NIST Dataset
 NIST- National Institute of Standard and Technology
database
 MNIST database
• Consists of handwritten images of digits from 0
to 9
• Subset of famous NIST Dataset
 Images centered in 28*28 pixels in black and white
 Dataset of 70,000 images of which 60,000 are for
training and remaining 10,000 for test set
MNIST Dataset
Training Set,
60000
Test set,
10000
Fig. MNIST Dataset

Results: LeNet-1
• MNIST database is used in which
10% dataset is used for validation
and remaining for training.
• Test loss: 0.05995325744152069
• Test accuracy:
0.9811999797821045
• As number of epochs increases,
training loss and validation loss
decreases. Also training accuracy
and validation accuracy increases
with number of epochs.

Results: LeNet-4
• Test loss: 0.052783720195293427
• Test accuracy:
0.9832000136375427
• The test accuracy of LeNet-4 is
more than LeNet-1. Also, test loss
is less than of LeNet-1.

Results: LeNet-5
• Test loss: 0.038686320185661316
• Test accuracy:
0.9866999983787537
• The test accuracy of LeNet-5 is
more than both of LeNet-4 and
LeNet-1. Also, test loss is lesser
than both of LeNet-1and LeNet-4.

Results:
Comparison of various classifiers on MNIST
Dataset

Gradient Based Learning Applied to Document Recognition

Gradient Based Learning Applied to Document Recognition

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Gradient Based Learning Applied to Document Recognition

Similar to Gradient Based Learning Applied to Document Recognition (20)

More from Gyanendra Awasthi

More from Gyanendra Awasthi (8)

Recently uploaded

Recently uploaded (20)

Gradient Based Learning Applied to Document Recognition