SlideShare a Scribd company logo
1 of 29
Download to read offline
Improving Isolated Bangla Compound
Character Recognition Through
Feature-map Alignment
Pinaki Ranjan Sarkara
, Deepak Mishraa
and Gorthi R.K.S.S Manyamb
a,b: Indian Institute of Space Science and Technology, Trivandrum
Indian Institute of Technology, Tirupati
9th
International Conference On
Advances in Patten Recognition, ISI-Bangalore
December 29, 2017
Outline
1 Introduction
Motivation
Difficulties
Objective
2 Background Theories
Deep Learning
3 Proposed Approach
Preprocessing
Recognition Network
Spatial Transformer Network
4 Results
5 Conclusion
P. R. Sarkar December 29, 2017 2/29
Introduction Motivation
Motivation
Motivation
Though tremendous strides have been made in character
recognition but it is still considered to be a difficult problem
when the data is rotated and non-uniform in scale.
We have seen that very few works have been done for Indian
languages using Deep Learning framework.
In this work, we have taken the problem of improving the
recognizing capability of handwritten Bangla compound
characters using feature-map alignment.
P. R. Sarkar December 29, 2017 3/29
Introduction Difficulties
Difficulties
Difficulties
The handwritten characters in the database have different
scales and they are neither uniform in scale nor centralized.
(a) (b)
Figure 1: Same characters with difference in scale and orientation
P. R. Sarkar December 29, 2017 4/29
Introduction Objective
What should be the objectives then?
Objectives
Correct recognition of characters which are highly non-uniform
in scale and rotated.
To maximize the recognition accuracy, we should use some
sort of transformer which will align the characters.
To create a highly accurate classifier with low false positives.
P. R. Sarkar December 29, 2017 5/29
Background Theories Deep Learning
Deep Learning
“Deep Learning is an algorithm which has no theoretical limitations of
what it can learn; the more data you give and the more computational
time you provide, the better it is.”
- Geoffrey Hinton, Google
Deep learning maybe loosely defined as an attempt to train a
hierarchy of feature detectors with each layer learning a higher
representation of the preceding layer.
Deep learning discovers intricate structure in large data sets by
using the backpropagation algorithm to indicate how a machine
should change its internal parameters that are used to compute the
representation in each layer from the representation in the previous
layer1
.
1
LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. Nature, 521(7553), pp.436-444..
P. R. Sarkar December 29, 2017 6/29
Background Theories Deep Learning
Successful Architectures in DL
Many variants of deep learning architectures are being proposed
and some of them are proved to be successful such as:
Convolutional Neural Network (CNN)2
Deep Boltzmann Machine (DBM)3
Deep Belief Networks (DBN)4
Stacked Denoising Auto-encoders (SDAE)5
Recently, Hinton et al. has published another breakthrough paper
in the field of deep learning which is called “Dynamic routing
between capsules”.
2
A. Krizhevsky, “Imagenet classification with deep convolutional neural networks”.
3
R. Salakhutdinov and G. E. Hinton, “Deep boltzmann machines, ” in AISTATS, vol. 1, p. 3, 2009.
4
G. E. Hinton, “Deep belief networks, ” Scholarpedia, vol. 4, no. 5, p. 5947, 2009.
5
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
“Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion”.
P. R. Sarkar December 29, 2017 7/29
Background Theories Deep Learning
Convolutional Neural Network
Figure 2: The CNN architecture is composed hierarchical units and each unit extracts
different level of features. Combining more units will produce deeper network along
with more semantic features.6
6
P.R.Sarkar, Deepak Mishra. and Gorthi R.K.S.S. Manyam,
“Classification of Breast Masses Using Convolutional Neural Network as Feature Extractor and Classifier”,
International Conference on Computer Vision and Image Processing(CVIP)-2017.
P. R. Sarkar December 29, 2017 8/29
Proposed Approach
Sate-of-the-art work
The current state-of-the art paper is proposed by Sarkhel et.
al. using Multi-scale deep quad tree based feature extraction7
They have achieved 98.12% recognition accuracy in
CMATERdb 3.1.3.3 database.
7
R. Sarkhel, N. Das, A. Das, M. Kundu, and M. Nasipuri,
“A multiscale deep quad tree based feature extraction method for the recognition -,
of isolated handwritten characters of popular indic scripts” Pattern Recognition, 2017..
P. R. Sarkar December 29, 2017 9/29
Proposed Approach
Database
We are using the CMATERdb 3.1.3.3 database which has 171
unique classes of isolated grayscale images of Bangla
compound characters. The database consists of 34439
individual training images and 8520 test images.
The characters in the database have different scales and they
are neither uniform nor centralized.
Later, we experimented the results of feature-map alignment
in rotated database and in-order to do that we took 9
rotations [−600, +600, with 15o interval] for each training
images and 9 random rotations [−600, +600] in test images.
P. R. Sarkar December 29, 2017 10/29
Proposed Approach
Database
After taking random rotations the size of the augmented
database became too large, so we took one third of the
augmented data for our experiment.
Parameter No. of
Samples
Scale Non-uniform
Translation Non-centred
Class 171
Training data 34439
Testing data 8520
Parameter No. of
Samples
Scale Non-uniform
Translation Non-centred
Class 171
Training data 103317
Testing data 25560
Table 1: Non-augmented (left) & augmented database (right)
P. R. Sarkar December 29, 2017 11/29
Proposed Approach Preprocessing
Preprocessing
Normalized images (zero mean and unit variance) from the
compliment of the original images were taken after the
preprocessing.
Figure 3: Preprocessing of the samples
This is done as it allows the network to learn only from the
character structure or shape. Taking compliment of the
original images does not allow any influence of the
background pixel. To reduce noise in and artifacts, the
normalization was done.
P. R. Sarkar December 29, 2017 12/29
Proposed Approach Recognition Network
Recognition Network
Before designing a model for Bangla character recognition
problem, we have taken care of two important things as
follows:
differently scaled characters
non-centered characters
Most of the previous works did not consider the non-uniformity
of scales among characters in the CMATERdb 3.1.3 database.
Spatial transformer network can handle both the problems.
The reason behind using multiple STNs is to properly align the
input image as well as the feature-maps during learning stage.
P. R. Sarkar December 29, 2017 13/29
Proposed Approach Recognition Network
Recognition Network
Figure 4: Proposed network for Bangla compound character recognition. Each spatial
transformer network learns effective feature-map alignment which improves overall
recognition rate
STN 1 is used for coarse or image level correction.
STN 2 and STN 3 are used for finer correction of middle-level
feature maps.
STN 4 is used for finer correction of high-level feature maps and to
obtain recognition performance as the state-of-the-arts.
P. R. Sarkar December 29, 2017 14/29
Proposed Approach Spatial Transformer Network
What is Spatial Transformer Network (STN)?
Formulating Spatial Transforms
Three main differentiable blocks:
Localisation network
Grid generator
Sampler
Why do we need?
To make CNN invariant to scale, rotation and translation.
P. R. Sarkar December 29, 2017 15/29
Proposed Approach Spatial Transformer Network
Spatial Transformer Network
Figure 5: Spatial Transformer Network8
8
M. Jaderberg, K. Simonyan, and A. Zisserman, “Spatial transformer networks”,
in Advances in Neural Information Processing Systems, 2015, pp. 20172025..
P. R. Sarkar December 29, 2017 16/29
Proposed Approach Spatial Transformer Network
Intuition behind STN
(a) The sampling grid is the regular grid
G = TI (G), where I is the identity
transformation parameters
(b) The sampling grid is the result of
warping the regular grid with an affine
transformation Tθ(G)
Figure 6: Parametrised sampling grid to an image U producing the output V
P. R. Sarkar December 29, 2017 17/29
Proposed Approach Spatial Transformer Network
Recognition Network
In addition to these, multi-scale learning is also employed by
using different size of filters while pooling or downsampling
the feature-maps before passing through STNs.
Doing this enables learning feature-map alignment from
differently scaled input feature-maps. The complete
recognition network is shown in the previous slide.
P. R. Sarkar December 29, 2017 18/29
Proposed Approach Spatial Transformer Network
Parameter Selection
Though the parameters does not affect the recognition accuracy
much, still to give a feel of multiscale learning within the proposed
network we have given the detailed parameters of each STNs.
In our network, (1,1) and (3,3) pooling filters are being used while
passing through STN 2 and 3 respectively. It allows multi-scale
learning of feature-map alignment.
Similarly, it is possible to introduce (5,5) pooling or more but
reducing input feature-map by 5 times makes it difficult to retrieve
any useful information so (1,1) and (3,3) max-pooling filters are
sufficient for our objective.
The downsampling factor should be noted during the use of STNs
as that decreases the height and width of feature-maps which might
not allow use of multiple pool or convolutional layers after
downsampling.
P. R. Sarkar December 29, 2017 19/29
Proposed Approach Spatial Transformer Network
Parameter Selection
Figure 7: Spatial transformer network details in the proposed network
P. R. Sarkar December 29, 2017 20/29
Results
Performance analysis
0 20 40 60 80 100 120 140
num of Epochs
0
1
2
3
4
loss
train-loss
without STN
with STN 1
with STN 1 + STN 2 + STN 3
with STN 1 + STN 2 + STN 3 + STN 4
(a)
0 20 40 60 80 100 120 140
num of Epochs
0.2
0.4
0.6
0.8
1.0
accuracy
test-acc
without STN
with STN 1
with STN 1 + STN 2 + STN 3
with STN 1 + STN 2 + STN 3 + STN 4
(b)
Figure 8: Performance of the recognition network. (a) Training loss measure (b)
Testing performance measure.
P. R. Sarkar December 29, 2017 21/29
Results
Performance analysis
Figure 9: Performance of the proposed model with various combination of train and
test set of CMATERdb 3.1.3 database.
P. R. Sarkar December 29, 2017 22/29
Results
Results
(a) (b)
Figure 10: Output from STN 1 in the proposed model. Input to STN = left image &
Output from STN = right image.
P. R. Sarkar December 29, 2017 23/29
Results
Contribution
Table 2: Comparison of recognition performance
Methods Techniques Database Recognition
(CMATERdb) accuracy
Das et al.9 Greedy layer based CNN original 90.33%
Sarkhel et al.10 Multi-scale deep quad tree original 98.12%
based feature extraction
Ours Preprocessing + CNN + 4 STN original 97.68%
(trained on non-augmented &
(tested on non-augmented)
Ours Preprocessing + CNN + 4 STN augmented 96.34%
(trained on augmented &
tested on augmented)
Ours Preprocessing + CNN + 4 STN augmented 98.22%
(trained on augmented &
tested on original)
9
S. Roy, N. Das, M. Kundu, and M. Nasipuri,
Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,
Pattern Recognition Letters, vol. 90, pp. 1521, 2017.
10
R. Sarkhel, N. Das, A. Das, M. Kundu, and M. Nasipuri,
“A multiscale deep quad tree based feature extraction method for the recognition of isolated handwritten characters,
of popular indic scripts, Pattern Recognition, 2017..
P. R. Sarkar December 29, 2017 24/29
Results
Examples of correctly recognized characters
(a) (b)
Figure 11: Examples of correctly recognized characters
P. R. Sarkar December 29, 2017 25/29
Results
Examples of falsely recognized characters
(a) (b)
Figure 12: Examples of falsely recognized characters
P. R. Sarkar December 29, 2017 26/29
Conclusion
Conclusion
In this paper, we have shown that we can improve the performance
of OCR systems (based on deep learning framework) through
feature-map alignment.
We have used multiple spatial transformer networks and highlighted
their contribution in performance improvement.
Our proposed network is capable of showing good recognition result
in non-rotated and non-uniformly scaled characters i.e. CMATERdb
3.1.3 database.
This network demonstrates similar recognition performance to the
state-of-the-art though our network is shallower than the existing
deep networks.
This work may also help in rotation invariant object detection.
P. R. Sarkar December 29, 2017 27/29
Questions?
sarkar0499pinaki@gmail.com
Thank you.

More Related Content

What's hot

IRJET- Facial Emotion Detection using Convolutional Neural Network
IRJET- Facial Emotion Detection using Convolutional Neural NetworkIRJET- Facial Emotion Detection using Convolutional Neural Network
IRJET- Facial Emotion Detection using Convolutional Neural NetworkIRJET Journal
 
Facial Emotion Recognition using Convolution Neural Network
Facial Emotion Recognition using Convolution Neural NetworkFacial Emotion Recognition using Convolution Neural Network
Facial Emotion Recognition using Convolution Neural NetworkYogeshIJTSRD
 
Gender Classification using SVM With Flask
Gender Classification using SVM With FlaskGender Classification using SVM With Flask
Gender Classification using SVM With FlaskAI Publications
 
GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNAbhishek Tiwari
 
IRJET- Emotion Classification of Human Face Expressions using Transfer Le...
IRJET-  	  Emotion Classification of Human Face Expressions using Transfer Le...IRJET-  	  Emotion Classification of Human Face Expressions using Transfer Le...
IRJET- Emotion Classification of Human Face Expressions using Transfer Le...IRJET Journal
 
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Willy Marroquin (WillyDevNET)
 
A Self learning AI Network (ANN) for question classification suresh-kodoor
A Self learning AI Network (ANN) for question classification suresh-kodoorA Self learning AI Network (ANN) for question classification suresh-kodoor
A Self learning AI Network (ANN) for question classification suresh-kodoorSuresh Kodoor
 
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...Editor IJMTER
 
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...CSCJournals
 
character recognition: Scope and challenges
 character recognition: Scope and challenges character recognition: Scope and challenges
character recognition: Scope and challengesVikas Dongre
 
TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019sipij
 
Comparison of thresholding methods
Comparison of thresholding methodsComparison of thresholding methods
Comparison of thresholding methodsVrushali Lanjewar
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterIOSR Journals
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIERHANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIERvineet raj
 
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKorea University
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using KerasIRJET Journal
 

What's hot (20)

IRJET- Facial Emotion Detection using Convolutional Neural Network
IRJET- Facial Emotion Detection using Convolutional Neural NetworkIRJET- Facial Emotion Detection using Convolutional Neural Network
IRJET- Facial Emotion Detection using Convolutional Neural Network
 
Facial Emotion Recognition using Convolution Neural Network
Facial Emotion Recognition using Convolution Neural NetworkFacial Emotion Recognition using Convolution Neural Network
Facial Emotion Recognition using Convolution Neural Network
 
Human Emotion Recognition
Human Emotion RecognitionHuman Emotion Recognition
Human Emotion Recognition
 
Gender Classification using SVM With Flask
Gender Classification using SVM With FlaskGender Classification using SVM With Flask
Gender Classification using SVM With Flask
 
Real time facial expression analysis using pca
Real time facial expression analysis using pcaReal time facial expression analysis using pca
Real time facial expression analysis using pca
 
GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNN
 
IRJET- Emotion Classification of Human Face Expressions using Transfer Le...
IRJET-  	  Emotion Classification of Human Face Expressions using Transfer Le...IRJET-  	  Emotion Classification of Human Face Expressions using Transfer Le...
IRJET- Emotion Classification of Human Face Expressions using Transfer Le...
 
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
 
A Self learning AI Network (ANN) for question classification suresh-kodoor
A Self learning AI Network (ANN) for question classification suresh-kodoorA Self learning AI Network (ANN) for question classification suresh-kodoor
A Self learning AI Network (ANN) for question classification suresh-kodoor
 
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
 
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
 
character recognition: Scope and challenges
 character recognition: Scope and challenges character recognition: Scope and challenges
character recognition: Scope and challenges
 
TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019
 
Comparison of thresholding methods
Comparison of thresholding methodsComparison of thresholding methods
Comparison of thresholding methods
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIERHANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
 
Cc4301455457
Cc4301455457Cc4301455457
Cc4301455457
 
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 

Similar to Improving Isolated Bangla Character Recognition Through Feature-map Alignment

[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesis[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesisChung-Il Kim
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfSachin414679
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdfKhondokerAbuNaim
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Facial image retrieval on semantic features using adaptive mean genetic algor...
Facial image retrieval on semantic features using adaptive mean genetic algor...Facial image retrieval on semantic features using adaptive mean genetic algor...
Facial image retrieval on semantic features using adaptive mean genetic algor...TELKOMNIKA JOURNAL
 
Satellite Image Classification using Decision Tree, SVM and k-Nearest Neighbor
Satellite Image Classification using Decision Tree, SVM and k-Nearest NeighborSatellite Image Classification using Decision Tree, SVM and k-Nearest Neighbor
Satellite Image Classification using Decision Tree, SVM and k-Nearest NeighborNational Cheng Kung University
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMIRJET Journal
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...GagandeepKaur872517
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdfKammetaJoshna
 
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...ijesajournal
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
 
Optically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelOptically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
 

Similar to Improving Isolated Bangla Character Recognition Through Feature-map Alignment (20)

[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesis[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesis
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdf
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Facial image retrieval on semantic features using adaptive mean genetic algor...
Facial image retrieval on semantic features using adaptive mean genetic algor...Facial image retrieval on semantic features using adaptive mean genetic algor...
Facial image retrieval on semantic features using adaptive mean genetic algor...
 
Satellite Image Classification using Decision Tree, SVM and k-Nearest Neighbor
Satellite Image Classification using Decision Tree, SVM and k-Nearest NeighborSatellite Image Classification using Decision Tree, SVM and k-Nearest Neighbor
Satellite Image Classification using Decision Tree, SVM and k-Nearest Neighbor
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
J25043046
J25043046J25043046
J25043046
 
J25043046
J25043046J25043046
J25043046
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
 
Visualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLABVisualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLAB
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf Manuscripts
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...
 
Optically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelOptically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network model
 

Recently uploaded

Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 

Improving Isolated Bangla Character Recognition Through Feature-map Alignment

  • 1. Improving Isolated Bangla Compound Character Recognition Through Feature-map Alignment Pinaki Ranjan Sarkara , Deepak Mishraa and Gorthi R.K.S.S Manyamb a,b: Indian Institute of Space Science and Technology, Trivandrum Indian Institute of Technology, Tirupati 9th International Conference On Advances in Patten Recognition, ISI-Bangalore December 29, 2017
  • 2. Outline 1 Introduction Motivation Difficulties Objective 2 Background Theories Deep Learning 3 Proposed Approach Preprocessing Recognition Network Spatial Transformer Network 4 Results 5 Conclusion P. R. Sarkar December 29, 2017 2/29
  • 3. Introduction Motivation Motivation Motivation Though tremendous strides have been made in character recognition but it is still considered to be a difficult problem when the data is rotated and non-uniform in scale. We have seen that very few works have been done for Indian languages using Deep Learning framework. In this work, we have taken the problem of improving the recognizing capability of handwritten Bangla compound characters using feature-map alignment. P. R. Sarkar December 29, 2017 3/29
  • 4. Introduction Difficulties Difficulties Difficulties The handwritten characters in the database have different scales and they are neither uniform in scale nor centralized. (a) (b) Figure 1: Same characters with difference in scale and orientation P. R. Sarkar December 29, 2017 4/29
  • 5. Introduction Objective What should be the objectives then? Objectives Correct recognition of characters which are highly non-uniform in scale and rotated. To maximize the recognition accuracy, we should use some sort of transformer which will align the characters. To create a highly accurate classifier with low false positives. P. R. Sarkar December 29, 2017 5/29
  • 6. Background Theories Deep Learning Deep Learning “Deep Learning is an algorithm which has no theoretical limitations of what it can learn; the more data you give and the more computational time you provide, the better it is.” - Geoffrey Hinton, Google Deep learning maybe loosely defined as an attempt to train a hierarchy of feature detectors with each layer learning a higher representation of the preceding layer. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer1 . 1 LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. Nature, 521(7553), pp.436-444.. P. R. Sarkar December 29, 2017 6/29
  • 7. Background Theories Deep Learning Successful Architectures in DL Many variants of deep learning architectures are being proposed and some of them are proved to be successful such as: Convolutional Neural Network (CNN)2 Deep Boltzmann Machine (DBM)3 Deep Belief Networks (DBN)4 Stacked Denoising Auto-encoders (SDAE)5 Recently, Hinton et al. has published another breakthrough paper in the field of deep learning which is called “Dynamic routing between capsules”. 2 A. Krizhevsky, “Imagenet classification with deep convolutional neural networks”. 3 R. Salakhutdinov and G. E. Hinton, “Deep boltzmann machines, ” in AISTATS, vol. 1, p. 3, 2009. 4 G. E. Hinton, “Deep belief networks, ” Scholarpedia, vol. 4, no. 5, p. 5947, 2009. 5 P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion”. P. R. Sarkar December 29, 2017 7/29
  • 8. Background Theories Deep Learning Convolutional Neural Network Figure 2: The CNN architecture is composed hierarchical units and each unit extracts different level of features. Combining more units will produce deeper network along with more semantic features.6 6 P.R.Sarkar, Deepak Mishra. and Gorthi R.K.S.S. Manyam, “Classification of Breast Masses Using Convolutional Neural Network as Feature Extractor and Classifier”, International Conference on Computer Vision and Image Processing(CVIP)-2017. P. R. Sarkar December 29, 2017 8/29
  • 9. Proposed Approach Sate-of-the-art work The current state-of-the art paper is proposed by Sarkhel et. al. using Multi-scale deep quad tree based feature extraction7 They have achieved 98.12% recognition accuracy in CMATERdb 3.1.3.3 database. 7 R. Sarkhel, N. Das, A. Das, M. Kundu, and M. Nasipuri, “A multiscale deep quad tree based feature extraction method for the recognition -, of isolated handwritten characters of popular indic scripts” Pattern Recognition, 2017.. P. R. Sarkar December 29, 2017 9/29
  • 10. Proposed Approach Database We are using the CMATERdb 3.1.3.3 database which has 171 unique classes of isolated grayscale images of Bangla compound characters. The database consists of 34439 individual training images and 8520 test images. The characters in the database have different scales and they are neither uniform nor centralized. Later, we experimented the results of feature-map alignment in rotated database and in-order to do that we took 9 rotations [−600, +600, with 15o interval] for each training images and 9 random rotations [−600, +600] in test images. P. R. Sarkar December 29, 2017 10/29
  • 11. Proposed Approach Database After taking random rotations the size of the augmented database became too large, so we took one third of the augmented data for our experiment. Parameter No. of Samples Scale Non-uniform Translation Non-centred Class 171 Training data 34439 Testing data 8520 Parameter No. of Samples Scale Non-uniform Translation Non-centred Class 171 Training data 103317 Testing data 25560 Table 1: Non-augmented (left) & augmented database (right) P. R. Sarkar December 29, 2017 11/29
  • 12. Proposed Approach Preprocessing Preprocessing Normalized images (zero mean and unit variance) from the compliment of the original images were taken after the preprocessing. Figure 3: Preprocessing of the samples This is done as it allows the network to learn only from the character structure or shape. Taking compliment of the original images does not allow any influence of the background pixel. To reduce noise in and artifacts, the normalization was done. P. R. Sarkar December 29, 2017 12/29
  • 13. Proposed Approach Recognition Network Recognition Network Before designing a model for Bangla character recognition problem, we have taken care of two important things as follows: differently scaled characters non-centered characters Most of the previous works did not consider the non-uniformity of scales among characters in the CMATERdb 3.1.3 database. Spatial transformer network can handle both the problems. The reason behind using multiple STNs is to properly align the input image as well as the feature-maps during learning stage. P. R. Sarkar December 29, 2017 13/29
  • 14. Proposed Approach Recognition Network Recognition Network Figure 4: Proposed network for Bangla compound character recognition. Each spatial transformer network learns effective feature-map alignment which improves overall recognition rate STN 1 is used for coarse or image level correction. STN 2 and STN 3 are used for finer correction of middle-level feature maps. STN 4 is used for finer correction of high-level feature maps and to obtain recognition performance as the state-of-the-arts. P. R. Sarkar December 29, 2017 14/29
  • 15. Proposed Approach Spatial Transformer Network What is Spatial Transformer Network (STN)? Formulating Spatial Transforms Three main differentiable blocks: Localisation network Grid generator Sampler Why do we need? To make CNN invariant to scale, rotation and translation. P. R. Sarkar December 29, 2017 15/29
  • 16. Proposed Approach Spatial Transformer Network Spatial Transformer Network Figure 5: Spatial Transformer Network8 8 M. Jaderberg, K. Simonyan, and A. Zisserman, “Spatial transformer networks”, in Advances in Neural Information Processing Systems, 2015, pp. 20172025.. P. R. Sarkar December 29, 2017 16/29
  • 17. Proposed Approach Spatial Transformer Network Intuition behind STN (a) The sampling grid is the regular grid G = TI (G), where I is the identity transformation parameters (b) The sampling grid is the result of warping the regular grid with an affine transformation Tθ(G) Figure 6: Parametrised sampling grid to an image U producing the output V P. R. Sarkar December 29, 2017 17/29
  • 18. Proposed Approach Spatial Transformer Network Recognition Network In addition to these, multi-scale learning is also employed by using different size of filters while pooling or downsampling the feature-maps before passing through STNs. Doing this enables learning feature-map alignment from differently scaled input feature-maps. The complete recognition network is shown in the previous slide. P. R. Sarkar December 29, 2017 18/29
  • 19. Proposed Approach Spatial Transformer Network Parameter Selection Though the parameters does not affect the recognition accuracy much, still to give a feel of multiscale learning within the proposed network we have given the detailed parameters of each STNs. In our network, (1,1) and (3,3) pooling filters are being used while passing through STN 2 and 3 respectively. It allows multi-scale learning of feature-map alignment. Similarly, it is possible to introduce (5,5) pooling or more but reducing input feature-map by 5 times makes it difficult to retrieve any useful information so (1,1) and (3,3) max-pooling filters are sufficient for our objective. The downsampling factor should be noted during the use of STNs as that decreases the height and width of feature-maps which might not allow use of multiple pool or convolutional layers after downsampling. P. R. Sarkar December 29, 2017 19/29
  • 20. Proposed Approach Spatial Transformer Network Parameter Selection Figure 7: Spatial transformer network details in the proposed network P. R. Sarkar December 29, 2017 20/29
  • 21. Results Performance analysis 0 20 40 60 80 100 120 140 num of Epochs 0 1 2 3 4 loss train-loss without STN with STN 1 with STN 1 + STN 2 + STN 3 with STN 1 + STN 2 + STN 3 + STN 4 (a) 0 20 40 60 80 100 120 140 num of Epochs 0.2 0.4 0.6 0.8 1.0 accuracy test-acc without STN with STN 1 with STN 1 + STN 2 + STN 3 with STN 1 + STN 2 + STN 3 + STN 4 (b) Figure 8: Performance of the recognition network. (a) Training loss measure (b) Testing performance measure. P. R. Sarkar December 29, 2017 21/29
  • 22. Results Performance analysis Figure 9: Performance of the proposed model with various combination of train and test set of CMATERdb 3.1.3 database. P. R. Sarkar December 29, 2017 22/29
  • 23. Results Results (a) (b) Figure 10: Output from STN 1 in the proposed model. Input to STN = left image & Output from STN = right image. P. R. Sarkar December 29, 2017 23/29
  • 24. Results Contribution Table 2: Comparison of recognition performance Methods Techniques Database Recognition (CMATERdb) accuracy Das et al.9 Greedy layer based CNN original 90.33% Sarkhel et al.10 Multi-scale deep quad tree original 98.12% based feature extraction Ours Preprocessing + CNN + 4 STN original 97.68% (trained on non-augmented & (tested on non-augmented) Ours Preprocessing + CNN + 4 STN augmented 96.34% (trained on augmented & tested on augmented) Ours Preprocessing + CNN + 4 STN augmented 98.22% (trained on augmented & tested on original) 9 S. Roy, N. Das, M. Kundu, and M. Nasipuri, Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach, Pattern Recognition Letters, vol. 90, pp. 1521, 2017. 10 R. Sarkhel, N. Das, A. Das, M. Kundu, and M. Nasipuri, “A multiscale deep quad tree based feature extraction method for the recognition of isolated handwritten characters, of popular indic scripts, Pattern Recognition, 2017.. P. R. Sarkar December 29, 2017 24/29
  • 25. Results Examples of correctly recognized characters (a) (b) Figure 11: Examples of correctly recognized characters P. R. Sarkar December 29, 2017 25/29
  • 26. Results Examples of falsely recognized characters (a) (b) Figure 12: Examples of falsely recognized characters P. R. Sarkar December 29, 2017 26/29
  • 27. Conclusion Conclusion In this paper, we have shown that we can improve the performance of OCR systems (based on deep learning framework) through feature-map alignment. We have used multiple spatial transformer networks and highlighted their contribution in performance improvement. Our proposed network is capable of showing good recognition result in non-rotated and non-uniformly scaled characters i.e. CMATERdb 3.1.3 database. This network demonstrates similar recognition performance to the state-of-the-art though our network is shallower than the existing deep networks. This work may also help in rotation invariant object detection. P. R. Sarkar December 29, 2017 27/29