Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Critical Review on Off-Line Sinhala Handwriting Recognition
1. Off-Line Sinhala Handwriting Recognition
D.G.A.M.Wijayarathna (144193K)
Faculty of Information Technology
University of Moratuwa
amwijayarathna@gmail.com
Abstract
Handwriting recognition one of the latest technologies in this era.
Using handwriting recognition technologies computer can identify
the characters and letters that written in a paper or printed. Sinhala
Handwriting Recognition is one of the trending topic these day. This
paper is intended to produce a Literature Review paper to provide a
summary of literary sources on “Off-Line Sinhala Handwriting
Recognition” while identifying and evaluating issues to present
solutions, improvisations for further research in this technology.
Key Terms- Handwriting Recognition, Sinahala, Segmentation, Data
Collection, Grouping.
I. INTRODUCTION
Sinhala language is the mainly used language of Sri Lanka. It is
one of the official languages of the country. Majority of the Sri
Lankan population speak Sinhala. It is essentially the language
through which they communicate for their daily activities as well
as most of the official purposes.
The language originated from the Indo-Aryan group of
languages along with Sanskruth, Hindi and Benghali. In Sinhala
language, the character set is represented by 16 vowels, 40
consonants, 2 semi-consonants and 13 consonant modifiers [1].
Handwriting Recognition is a rapidly increasing and growing
technology in present technical world. So the Sinhala
Handwriting recognition is a much needed technology for Sri
Lankan because sinhala is more familiar to sri lankan people
than any other language. But there are very few tools for sinhala
handwriting recognition.
So far Sinhala Handwriting Recognition done by Zone Based
Feature Extraction[1] ,Hidden Markov Models[2], computational
pattern recognition methods such as Artificial Neural
Networks(ANN)[3] and some other techniques. Also, Sinhala
Handwriting Recognition techniques can be offline or online.
Off-Line Recognition of handwriting has many practical usage in
many areas of banking, census, mail sorting, commerce and
such. [3].
Sinhala letters are written in the left to right direction. Most of
the letters are curve shaped. And all these characters are written
within three horizontal layers. They are upper layer, middle
and lower layer [Figure1]. Some characters are written within
these three layers, some of them are written only in the middle
layer and another set of characters are written in the middle
layer and in other two layers but it is optional to occupy both
upper and lower layers for these characters.
Figure1: Sinhala characters written with layers[2]
Generally, 5 main stages of Character Recognition could be
identified as follows,
1) Preprocessing and data collecting;
2) Character Segmentation
3) Representation
4) Training and recognition
5) Post processing
These stages are reviewed in following sub topics,
II. PREPROCESSING DATACOLLECTION
In every research paper, the data collection is done by using A4
size papers. Every paper contain 10 lines of sample
handwritings. So every paper contains more than 20 words
written in lines. There are vowels, consonants and modifiers in
collected samples.
The documents will be scanned in a resolution of various dots
per inch(dpi) that suits and binarized using an adjustable
thresholding technique. Then the outputs were stored as gray
scale images. The thresholding technique is used to reduce the
noise and extract the handwriting.
2. Chamikara et al. Used methods for character preprocessing[10],
i) RGB to Grey
ii) Noise Removal
iii) Image binarization
iv) Image skeletonization
III. SEGMENTATION
After the pre processing of the image, character segmentation is
done by using different kind of image processing algorithms.
Generated images of segmented characters are used for character
classification. [Figure 2].
Figure 2: Character segmentation flow chart[1]
A. Segment Text Lines
Segment the lines is the first thing need to be done. gaps between
the lines can be clearly identified using the horizontal projection
profile of the entire image [Figure 3]. In the projection profile
graph the valleys represent the segmentation between the text
lines. If the graph consist of consecutive numbers of 0 values,
then it is considered as the boundary of the text.
Figure 3: Line segmentation using the horizontal projection
profile[5]
B. Segment Horizontal Characters
Segmenting horizontal characters can be done by the vertical
projection profile. From this the boundaries between characters
and words can be identified. But if the characters are
overlapped then this method won't be enough for the
segmentation [Figure 4a & 4b].
Figure 4a: Word segmentation using the vertical projection
profile[1]
Figure 4b: Character segmentation using the vertical
projection profile[5]
C. Identifying Overlapping Characters
Average character width can be calculate from the image width
and the number of characters in the image [Equation 1]
Average Image width [1]
character = --------------------
width number of characters
So the overlapping characters can be identifying by the
following algorithm.[Equation 2]
If ( ( 3 x Average Width /2 ) > Width )
Touching Character [2]
Else
Segmented Character
3. D. Segment Overlapping Characters
Darmapala et al. [1] used contours to segment overlapping
characters[Figure5].
Figure 5. Segment characters using contours[1]
And the Karunanayaka et al.[3] used Water Reservoir Concept
that is more complicate method than the earlier[Figure6].
Figure 6. Apply water reservoir concept in touching
character[3]
IV. GROUPING
Segmented characters are grouping in to four main groups
according to the character's layer structure[6]. The characters are
visualizing in the three layer structure[Figure7].
Figure 7. Three layered structure of the text line[6]
Character distances from the middle of the middle layer to the
upper and lower limits and character heights are considered in
the classification.
V. REPRESENTATION
One of the most important elements in a Handwriting
recognition system is image representation. Gray-level / binary
images are being fed to a recognizing method in its simplest
stage. However, in order to prevent additional complexity and
also to increase the accuracy of algorithms, a much compact and
quality representation is needed in many recognition systems. A
set of features is being extracted for each and every category that
help to separate and identify it from other classes while
remaining constant and unchanging to characteristic variations
within the class for this cause. [7]. Character representation
methods can be separate in to three main groups as following,
1) Global Transformation and Series Expansion:
Frequent methods of transformation and series expansion used
in the Character Recognition are comprised in the following.
a) Fourier Transforms
b) Gabor Transform
c) Wavelets
d) Moments
e) Karhunen–Loeve Expansion
2) Statistical Representation:
In this method character representation is done by using
variation of the characters to certain degree and convert in to
statistical dissemination. There are three methods for this
a) Zoning
b) Crossings & Distances
c) Projections
3) Geometrical and Topological Representation:
Topological and geometrical attributes with huge ability to style
variations anddistortions are used to represent numerous local
and global properties of characters.
a) Extracting & Counting Topological Structures
b)Measuring & Approximating the Geometrical
Properties
c) Coding
d) Graphs and Trees
But when it comes to Sinhala Handwriting Recognition it
mainly use Statistical Representation methods.
In Sinhala Handwriting Recognition most of researchers used
zone base feature extraction method for character
representation.
a. Feature Extraction
The aim of the feature extraction is to detect the properties of a
character which can identified uniquely and from that
4. maximizing the recognition rate. After feature extraction these
recognized features are used as inputs for training and
recognition systems[8].
Dharmapala et al. used feature extraction method for character
representation[1].
Figure 8. Classification flowchart for Feature Extraction
As shown in figure 8, first characters are differentiate into three
main zones using their height and width.
Figure 9: Character unit within the zones
The zoned characters are again divided into three horizontal
blocks and two vertical blocks. These are use for making
horizontal and vertical projection profiles of the characters. They
are the feature vector of the character. After these feature vecters
are extracted they use for training ANNs In recognition stage.
VI. TRAINING AND RECOGNITION TECHNIQUES
Character Recognition systems use pattern recognition methods
that allocate unknown samples to a predefined class. A lot of
character recognition methods could be studied under four
themes of pattern recognition[9].
These are the main techniques for off-line character recognition
1.Template matching
2 Statistical techniques
3 Structural techniques
4 Artificial Neural Networks (ANNs)
But in Sinhala Handwriting Recognition methodologies still
there are two main methods for recognition. They are briefed
below.
a. Statistical Techniques
Statistical decision theory concerns about the functions of
statistical decisions and a set of optimality criteria, that
maximize the probability of the pattern observed, given the
model of a certain class [38]. Statistical techniques are mostly
based on three main assumptions.
2.a Character Recognition Using Hidden Markov Model
Hewawitharana et al. used Hidden Markov Model to sinhala
handwriting recognition. Mixed cursive is the most general and
difficult type handwriting style, and in view of its automatic
recognition HMM is used. HMM is a doubly randomly
determined process that is not detectable under some
observations but it can be processed due to its stochastic
approach. It contains a set of states linked to each other by
transition with a expectation while the detected process be
formed of a set of operations .
HMM calculates the hidden states chain which has their basis
on the observation chain of Counter & Viterbi algorithm. Both
the algorithms have the most likely result and work in a
specific way[2].
b. Artificial Neural Networks (ANNs)
An Artificial Neural Network is a gathered set of artificial
neurons which can train to learn and use for information
processing techniques. ANNs mostly using computational
model or a mathematical model. These systems are adaptive so
they can change there formation regarding to their training and
information.
Chamikara et al. used ANN for sinhala handwriting
recognition[10]. It called as Fuzzy Neural Hybrid method. In
his segmentation method it is done by character separated to six
sections. Six neurons are use for recognize these sections
separately.
5. Figure 11. Architecture of the Fuzzy Neural Hybrid method
V. DISCUSSION
Sinhala language is the majority language of Sri Lanka. Because
of its complexity the Sinhala Handwriting Recognition is being
more complicated than the other languages. The sinhala
handwriting recognition is one of the most required system for
the society. And from this off-line handwriting recognition is
also much needed. Because as a developing country Sri Lanka
still using handwritten methods for several official and unofficial
works. Such as cheques and post cards. In this paper we have
discussed about many handwriting recognition methods and
procedures. They all have there own advantages and
disadvantages. So far the most difficult part in Sinhala
Handwriting Recognition is to identifying the overlapping and
the touched characters. Not as in other languages in sinhala it is
more complected because of its curved shape.
So far this field is approached by Zone Based Feature
Extraction[1] ,Hidden Markov Models[2], computational pattern
recognition such as artificial neural networks[3] and some other
techniques. All these techniques has common procedure. They
are,
1. Preprocessing
2. Segmentation
3. Feature Extraction
4. Recognition
Noise removing and binarization of the document are done in
prepossessing stage and after that characters are segmented. The
feature extraction stage is helps to maximize the recognition.
After that segmented characters are used to recognition stage.
In these recognition methods the feature extraction resulted
94% accuracy and Hidden Markove Method shows 64%
accuracy and ANNs technique shows 94% accuracy with
unique shapes and 75% rate with confusing shapes.
VI. CONCLUSION
The Sinhala Handwriting recognition is a much needed
technology for Sri Lankan because sinhala is more familiar to
sri lankan people than any other language. But there are very
few tools for sinhala handwriting recognition. This critical
review paper explains the data collection methods, character
segmentation, character grouping, feature extraction and
character recognition methods. The character segmentation
must be improved mainly because it is the most difficult field in
this technology. Among these techniques I think developing
ANN methods is more reliable because it is an upcoming field
in this era.
ACKNOWLEDGMENT
I would like to thank Dr. Lochandaka Ranathunga for guiding
and motivating me to complete this critical review paper. My
thanks and appreciations also go to my colleagues who have
willingly helped me out with their abilities to make this review
success..
REFERENCES
[1] K. A. K. N. D. Dharmapala, W. P. M. V. Wijesooriya, C. P.
Chandrasekara, U. K. A. U. Rathnapriya, and L. Ranathunga,
“Sinhala Handwriting Recognition Mechanism Using Zone
Based Feature Extraction,” UoM IR. [Online]. Available:
http://dl.lib.mrt.ac.lk/handle/123/12501. [Accessed: 04-May-
2017].
[2] S. Hewavitharana, H. C. Fernando, and N. D. Kodikara,
“Off-line Sinhala Handwriting Recognition using Hidden
Markov Models,” ResearchGate. [Online]. Available:
https://www.researchgate.net/publication/2550174_Off-
line_Sinhala_Handwriting_Recognition_using_Hidden_Marko
v_Models. [Accessed: 04-May-2017].
6. [3] M.L.M Karunanayaka, N.D Kodikara and G.D.S.P
Wimalaratne “Off Line Sinhala Handwriting Recognition with
an Application for Postal City Name Recognition,” Off Line
Sinhala Handwriting Recognition with an Application for Postal
City Name Recognition | International Conference on Advances
in ICT for Emerging Regions. [Online]. Available:
http://www.icter.org/conference/icter2016/?
q=iitc2004%2Fabstract%2FIITC-2004p4. [Accessed: 04-May-
2017].
[4] I. Manamperi, K. K. Wijesinghe, J. C. K. Gamage, D. S. K.
Priyarathne, and S. M. I. P. B. Samarakoon, “Sinhala Online
Handwriting Recognition System,” SLIIT: Home, 16-Jun-2014.
[Online]. Available: http://dspace.sliit.lk/handle/123456789/126.
[Accessed: 04-May-2017].
[5] S. Hewavitharana and N. D. Kodikara, “A Statistical
Approach to Sinhala Handwriting Recognition ,” ResearchGate.
[Online]. Available:
https://www.researchgate.net/publication/268302652_A_Statistic
al_Approach_to_Sinhala_Handwriting_Recognition. [Accessed:
10-Jun-2017].
[6] B. Jayasekara and L. Udawatta, “Non-Cursive Sinhala
Handwritten Script Recognition: A Genetic Algorithm
BasedAlphabet Training Approach,” Department of Electronic
and Telecommunication Engineering. [Online].
Available:www.ent.mrt.ac.lk/iml/ICIA2005/Papers/SL013CRC.p
df. [Accessed: 19-Jun-2017].
[7] R. Goswami, “A Review on Character Recognition
Techniques,” International Journal of Computer Applications -
IJCA. [Online]. Available:
http://www.ijcaonline.org/archives/volume83/number7/14460-
2737. [Accessed: 15-Aug-2017].
[8] H. Khandelwal, S. Gupta, and A. K. Jain, “Review of
Offline Handwriting Recognition Techniques in the fields of
HCR and OCR,” International Journal of Computer Trends
and Technology. [Online]. Available:
http://www.ijcttjournal.org/archives/ijctt-v47p123. [Accessed:
15-Aug-2017].
[9] N. Arica and F. T. Yarman-Vural, “An overview of character
recognition focused on off-line handwriting ,”
http://ieeexplore.ieee.org. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=941845.
[Accessed: 15-Aug-2017].
[10] M. Chamikara, S. R. Kodituwakku, A. A. C. A.
Jayathilake, and K. R. Wijeweera, “Fuzzy Neural Hybrid
Method for Sinhala Character Recognition,” Academia.edu.
[Online].
Available:http://www.academia.edu/8740975/Fuzzy_Neural_H
ybrid_Method_for_Sinhala_Character_Recognition
http://www.academia.edu/8740975/Fuzzy_Neural_Hybrid_Met
hod_for_Sinhala_Character_Recognition. [Accessed: 10-Aug-
2017].