Improving Isolated Bangla Character Recognition Through Feature-map Alignment

Improving Isolated Bangla Compound
Character Recognition Through
Feature-map Alignment
Pinaki Ranjan Sarkara
, Deepak Mishraa
and Gorthi R.K.S.S Manyamb
a,b: Indian Institute of Space Science and Technology, Trivandrum
Indian Institute of Technology, Tirupati
9th
International Conference On
Advances in Patten Recognition, ISI-Bangalore
December 29, 2017

Outline
1 Introduction
Motivation
Diﬃculties
Objective
2 Background Theories
Deep Learning
3 Proposed Approach
Preprocessing
Recognition Network
Spatial Transformer Network
4 Results
5 Conclusion
P. R. Sarkar December 29, 2017 2/29

Introduction Motivation
Motivation
Motivation
Though tremendous strides have been made in character
recognition but it is still considered to be a diﬃcult problem
when the data is rotated and non-uniform in scale.
We have seen that very few works have been done for Indian
languages using Deep Learning framework.
In this work, we have taken the problem of improving the
recognizing capability of handwritten Bangla compound
characters using feature-map alignment.

Introduction Difficulties
Difficulties
Difficulties
The handwritten characters in the database have different
scales and they are neither uniform in scale nor centralized.
(a) (b)
Figure 1: Same characters with difference in scale and orientation

Introduction Objective
What should be the objectives then?
Objectives
Correct recognition of characters which are highly non-uniform
in scale and rotated.
To maximize the recognition accuracy, we should use some
sort of transformer which will align the characters.
To create a highly accurate classiﬁer with low false positives.

Background Theories Deep Learning
Deep Learning
“Deep Learning is an algorithm which has no theoretical limitations of
what it can learn; the more data you give and the more computational
time you provide, the better it is.”
- Geoﬀrey Hinton, Google
Deep learning maybe loosely deﬁned as an attempt to train a
hierarchy of feature detectors with each layer learning a higher
representation of the preceding layer.
Deep learning discovers intricate structure in large data sets by
using the backpropagation algorithm to indicate how a machine
should change its internal parameters that are used to compute the
representation in each layer from the representation in the previous
layer1
.
1
LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. Nature, 521(7553), pp.436-444..

Successful Architectures in DL
Many variants of deep learning architectures are being proposed
and some of them are proved to be successful such as:
Convolutional Neural Network (CNN)2
Deep Boltzmann Machine (DBM)3
Deep Belief Networks (DBN)4
Stacked Denoising Auto-encoders (SDAE)5
Recently, Hinton et al. has published another breakthrough paper
in the ﬁeld of deep learning which is called “Dynamic routing
between capsules”.
2
A. Krizhevsky, “Imagenet classiﬁcation with deep convolutional neural networks”.
3
R. Salakhutdinov and G. E. Hinton, “Deep boltzmann machines, ” in AISTATS, vol. 1, p. 3, 2009.
4
G. E. Hinton, “Deep belief networks, ” Scholarpedia, vol. 4, no. 5, p. 5947, 2009.
5
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
“Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion”.

Convolutional Neural Network
Figure 2: The CNN architecture is composed hierarchical units and each unit extracts
different level of features. Combining more units will produce deeper network along
with more semantic features.6
6
P.R.Sarkar, Deepak Mishra. and Gorthi R.K.S.S. Manyam,
“Classification of Breast Masses Using Convolutional Neural Network as Feature Extractor and Classifier”,
International Conference on Computer Vision and Image Processing(CVIP)-2017.

Proposed Approach
Sate-of-the-art work
The current state-of-the art paper is proposed by Sarkhel et.
al. using Multi-scale deep quad tree based feature extraction7
They have achieved 98.12% recognition accuracy in
CMATERdb 3.1.3.3 database.
7
R. Sarkhel, N. Das, A. Das, M. Kundu, and M. Nasipuri,
“A multiscale deep quad tree based feature extraction method for the recognition -,
of isolated handwritten characters of popular indic scripts” Pattern Recognition, 2017..

Proposed Approach
Database
We are using the CMATERdb 3.1.3.3 database which has 171
unique classes of isolated grayscale images of Bangla
compound characters. The database consists of 34439
individual training images and 8520 test images.
The characters in the database have diﬀerent scales and they
are neither uniform nor centralized.
Later, we experimented the results of feature-map alignment
in rotated database and in-order to do that we took 9
rotations [−600, +600, with 15o interval] for each training
images and 9 random rotations [−600, +600] in test images.

Proposed Approach
Database
After taking random rotations the size of the augmented
database became too large, so we took one third of the
augmented data for our experiment.
Parameter No. of
Samples
Scale Non-uniform
Translation Non-centred
Class 171
Training data 34439
Testing data 8520
Parameter No. of
Samples
Scale Non-uniform
Translation Non-centred
Class 171
Training data 103317
Testing data 25560
Table 1: Non-augmented (left) & augmented database (right)

Proposed Approach Preprocessing
Preprocessing
Normalized images (zero mean and unit variance) from the
compliment of the original images were taken after the
preprocessing.
Figure 3: Preprocessing of the samples
This is done as it allows the network to learn only from the
character structure or shape. Taking compliment of the
original images does not allow any inﬂuence of the
background pixel. To reduce noise in and artifacts, the
normalization was done.

Proposed Approach Recognition Network
Recognition Network
Before designing a model for Bangla character recognition
problem, we have taken care of two important things as
follows:
diﬀerently scaled characters
non-centered characters
Most of the previous works did not consider the non-uniformity
of scales among characters in the CMATERdb 3.1.3 database.
Spatial transformer network can handle both the problems.
The reason behind using multiple STNs is to properly align the
input image as well as the feature-maps during learning stage.

Proposed Approach Recognition Network
Recognition Network
Figure 4: Proposed network for Bangla compound character recognition. Each spatial
transformer network learns effective feature-map alignment which improves overall
recognition rate
STN 1 is used for coarse or image level correction.
STN 2 and STN 3 are used for finer correction of middle-level
feature maps.
STN 4 is used for finer correction of high-level feature maps and to
obtain recognition performance as the state-of-the-arts.

Proposed Approach Spatial Transformer Network
What is Spatial Transformer Network (STN)?
Formulating Spatial Transforms
Three main diﬀerentiable blocks:
Localisation network
Grid generator
Sampler
Why do we need?
To make CNN invariant to scale, rotation and translation.

Spatial Transformer Network
Figure 5: Spatial Transformer Network8
8
M. Jaderberg, K. Simonyan, and A. Zisserman, “Spatial transformer networks”,
in Advances in Neural Information Processing Systems, 2015, pp. 20172025..

Intuition behind STN
(a) The sampling grid is the regular grid
G = TI (G), where I is the identity
transformation parameters
(b) The sampling grid is the result of
warping the regular grid with an aﬃne
transformation Tθ(G)
Figure 6: Parametrised sampling grid to an image U producing the output V

Recognition Network
In addition to these, multi-scale learning is also employed by
using different size of filters while pooling or downsampling
the feature-maps before passing through STNs.
Doing this enables learning feature-map alignment from
differently scaled input feature-maps. The complete
recognition network is shown in the previous slide.

Parameter Selection
Though the parameters does not affect the recognition accuracy
much, still to give a feel of multiscale learning within the proposed
network we have given the detailed parameters of each STNs.
In our network, (1,1) and (3,3) pooling filters are being used while
passing through STN 2 and 3 respectively. It allows multi-scale
learning of feature-map alignment.
Similarly, it is possible to introduce (5,5) pooling or more but
reducing input feature-map by 5 times makes it difficult to retrieve
any useful information so (1,1) and (3,3) max-pooling filters are
sufficient for our objective.
The downsampling factor should be noted during the use of STNs
as that decreases the height and width of feature-maps which might
not allow use of multiple pool or convolutional layers after
downsampling.

Parameter Selection
Figure 7: Spatial transformer network details in the proposed network

Results
Performance analysis
0 20 40 60 80 100 120 140
num of Epochs
0
1
2
3
4
loss
train-loss
without STN
with STN 1
with STN 1 + STN 2 + STN 3
with STN 1 + STN 2 + STN 3 + STN 4
(a)
0 20 40 60 80 100 120 140
num of Epochs
0.2
0.4
0.6
0.8
1.0
accuracy
test-acc
without STN
with STN 1
with STN 1 + STN 2 + STN 3
with STN 1 + STN 2 + STN 3 + STN 4
(b)
Figure 8: Performance of the recognition network. (a) Training loss measure (b)
Testing performance measure.

Results
Performance analysis
Figure 9: Performance of the proposed model with various combination of train and
test set of CMATERdb 3.1.3 database.

Results
Results
(a) (b)
Figure 10: Output from STN 1 in the proposed model. Input to STN = left image &
Output from STN = right image.

Results
Contribution
Table 2: Comparison of recognition performance
Methods Techniques Database Recognition
(CMATERdb) accuracy
Das et al.9 Greedy layer based CNN original 90.33%
Sarkhel et al.10 Multi-scale deep quad tree original 98.12%
based feature extraction
Ours Preprocessing + CNN + 4 STN original 97.68%
(trained on non-augmented &
(tested on non-augmented)
Ours Preprocessing + CNN + 4 STN augmented 96.34%
(trained on augmented &
tested on augmented)
Ours Preprocessing + CNN + 4 STN augmented 98.22%
(trained on augmented &
tested on original)
9
S. Roy, N. Das, M. Kundu, and M. Nasipuri,
Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,
Pattern Recognition Letters, vol. 90, pp. 1521, 2017.
10
R. Sarkhel, N. Das, A. Das, M. Kundu, and M. Nasipuri,
“A multiscale deep quad tree based feature extraction method for the recognition of isolated handwritten characters,
of popular indic scripts, Pattern Recognition, 2017..

Results
Examples of correctly recognized characters
(a) (b)
Figure 11: Examples of correctly recognized characters

Results
Examples of falsely recognized characters
(a) (b)
Figure 12: Examples of falsely recognized characters

Conclusion
Conclusion
In this paper, we have shown that we can improve the performance
of OCR systems (based on deep learning framework) through
feature-map alignment.
We have used multiple spatial transformer networks and highlighted
their contribution in performance improvement.
Our proposed network is capable of showing good recognition result
in non-rotated and non-uniformly scaled characters i.e. CMATERdb
3.1.3 database.
This network demonstrates similar recognition performance to the
state-of-the-art though our network is shallower than the existing
deep networks.
This work may also help in rotation invariant object detection.

Questions?
sarkar0499pinaki@gmail.com

Improving Isolated Bangla Character Recognition Through Feature-map Alignment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Improving Isolated Bangla Character Recognition Through Feature-map Alignment

Similar to Improving Isolated Bangla Character Recognition Through Feature-map Alignment (20)

Recently uploaded

Recently uploaded (20)

Improving Isolated Bangla Character Recognition Through Feature-map Alignment