SlideShare a Scribd company logo
1 of 11
Download to read offline
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No. 1, February 2014
DOI : 10.5121/ijitmc.2014.2102 11
FRAGMENTATION OF HANDWRITTEN TOUCHING
CHARACTERS IN DEVANAGARI SCRIPT
Shuchi Kapoor1
and Vivek Verma2
1,2
Department of Computer Engineering, Rajasthan College of Engineering for Women,
Jaipur, Rajasthan
ABSTRACT
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
KEYWORDS
Fragmentation, Joint point algorithm, Shirorekha, side bar, middle bar .
1. INTRODUCTION
After many years of research, hand written touching character segmentation remains a
challenging task. Touching character patterns emerge when two adjacent characters are written
too close, therefore, some parts of character are connected horizontally/left-right. In printed script
generally two consecutive characters do not overlap or touch but in handwritten everyone is
having their own writing style. In this present work we have proposed a method of segmenting
characters from handwritten Devanagari words. There is unpredictable variation in handwriting
style from person to person and Devanagari having large number of characters set makes
segmentation of handwritten words complex. Papers [1,2] give some surveys on segmentation
techniques for machine printed text. For segmentation of handwritten text also a survey paper [3]
is available. Very few papers are available on segmentation of handwritten characters from word.
No work for touching handwritten Devanagari character segmentation is found so far to the best
of our knowledge. In [4], Utpal Garain and Bidyut B. Chaudhuri, presents a new technique for
identification and segmentation of machine printed touching characters. They devised a fuzzy
multifactorial analysis technique which has been applied to printed documents in Devanagari and
Bangla. In [5], Veena Bansal and R.M.K.Sinha depict the system architecture and its components
for Devanagari text recognition system. The system consists of a solution blackboard and various
knowledge sources. In [6], M. Hanmandlu and Pooja Agrawal made an attempt to segment the
handwritten Hindi words. The segmentation problem is computed due to different writing styles
and presence of modifiers on all sides of characters. So, they proposed a structural approach to
identify the similarities and differences between structure classes. Veena Bansal and R.M. K.
Sinha [7], presents an algorithm for segmentation of machine printed touching Devanagari
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
12
characters (also referred to as conjuncts) into its constituent symbols and characters. In the
Existing system the selective algorithm is developed for the identification and removal of header
line and for further segmentation from the handwritten Devanagari characters. Proposed
algorithm extensively uses structural properties of the script. It uses vertical projection, continuity
of collapsed horizontal projection. We use structural properties of Devanagari script for
fragmentation of characters.
1.1. Characteristics of Devanagari Script
Devanagari Script is the most common script in India which include Marathi, Hindi, Sanskrit &
Nepali. It has 12 vowels and 35 consonants as shown in figure.1. Vowels can be written isolated
or by using variety of modifiers overhead, down, foreword or backward after the consonants they
belong to & those characters are called conjuncts as shown in figure 2. At times new shapes are
formed after combining two or more consonants & these new shapes are called compound
characters as shown in figure 3. In Devanagari script there is a header line known as Shirorekha,
this feature distinguish Devanagari script from English, so English will be extracted from these
script. There are four imaginary lines that drawn for a word, Headline is s same as header line,
Baseline where characters completed but excluding below modifiers. After below modifiers line
drawn called Lowerline and the line above Headline after modifiers are called Upperline resultant
text is partitioned into 3 zones- Upper zone between Upperline and Headline, Middle zone
between Headline and Baseline & last is Lower zone between Baseline and Lowerline.
A typical zoning is shown in Figure.4.
In Hindi we have three types of characters as follows also shown in Figure 5a,5b, 5c
1.END-BAR Characters
2.MIDDLE-BAR Characters
3.NO-BAR Characters
Figure 1. Consonants.
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
13
Figure 2. Modifiers(Ascenders and descenders)
Figure 3. Compound Characters
Figure 4. Different Zones of Devanagari Text.
Figure 5a. END-BAR Characters
Figure 5b. MIDDLE-BAR Characters
Figure 5c. NO-BAR Characters
As we are considering handwritten Devanagari, it again adds up complexity because while
writing, characters of the word may touch each other at different position as shown in figure 6.
For recognition, it is necessary to segment these touching characters of words. For this we
propose a system.
Figure 6. Samples of Touched Characters
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
14
2.RESEARCH METHOD
In Devanagari script few categories are classified on the basis of structural properties that are
given in figure 5.
Proposed System
2.1. Pre-processing:
1. In Image Acquisition scan a document and storing it as an image. This will act as input to the
program. Higher resolution produces better results.
2. In every binary image, each pixel is in form of two value: 1 or 0. The input image is first
converted into binary or image will be complemented means background is black and foreground
i.e words are white and represented as 1’s. In binary image attached words are found and isolate
each word separately.
3. In binarized image all small black patches are removed because that spots can create confusion
while thinning process. Bounding boxes are formed around small black spots and converted into
white.
Protruding points
1. Protruding points are those unnecessary joint points that generate after thinning process. So,
these protruding points are removed so that actual joint points can be generated between those
actual line exists.
2. In 3x3 matrix sum will be calculated, around each pixel, minus centre pixel. If this sum is less
than or equal to 3 then the protruding pixel is removed. This process is visualized in the figure.
3. Image with protruding points highlighted with red.
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
15
Figure 7. For eliminating unusable joint points in the image protruding points are removed
2.2. Word Processing Stage
2.2.1 The joint points
(i) Where two lines meet that point is called joint points. The lines can be straight or curved. On
the basis of these points characters are dissected but here we are using joint points just to form the
vertical bars and header lines.
(ii) By analyzing the design of pixels in 3x3 matrix adjacent to each pixel joint points are found.
(iii) With the help of joint points we will find vertical bars and header lines.
(iv) In 3 x 3 matrix sum will be calculated around each pixel minus centre pixel and the pixel is
considered as a joint point if sum is equal to more than 3. The result 3 means three lines meeting
at that point.
(v) If pixel is having only one line joining the point i.e only 1 pixel in the surrounding is called
terminating points.
Figure 8. Joint points (shown in green)
2.2.2 Form Bounding Box
To fragment each character in a word bounding boxes are formed in order to identify the header
line and vertical bars. Within each word horizontal rectangles are identified. In thinned image
bounding boxes are formed after removing the joint points.
2.2.3 Removal of Shirorekha
(i) The first aim is to find and remove shirorekha.
(ii) In thinned image bounding boxes are obtained leaving joint points.
(iii) Horizontal bounding boxes are recognized in every word.
(iv) Horizontal rectangles are form on the basis of width-height ratio i.e should be more than or
equal to 3:1.
(v) Heights of all pixels inside these rectangles are averaged.
(vi) If maximum number of pixels on one side of this averaged line than that rectangle are
considered as shirorekha.
(v) These rectangles are removed to remove the shirorekha.
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
16
Figure 9. Horizontal Bounding Boxes
2.2.4 The vertical bar detection
(i) Vertical rectangles will help to detect the vertical bars.
(ii) Vertical rectangles are having width-height ratio less than or equal to 1:3.
(iii) To identify the vertical bars percentage is calculated on the basis of how much percentage of
white pixels (parts of vertical bars) is coming in vertical rectangles and if it is more than 60%
then the rectangle holds vertical bar.
(iv) Rectangles are stretched upto the shirorekha in the top and at the bottom & will get a new
image.
(v) New image contains bounding boxes.
Figure 10. Vertical bars are detected
2.3. Character Processing Stage
Shirorekha and vertical bar removed image is considered for character processing.
1. Category 1: Mid Bar
Form a bounding box and check width of bounding box. For category 1, if width of bounding box
is greater than single character width (means two characters are touching) then make a cut at
leftmost joint point of large bounding box. We get separated characters. We get separated
characters.
Figure 11. Example of category 1 for cut line finding.
2. Category 2: Side Bar
When two side bar characters are touching each other, form bounding box. If there is no large
bounding box and no equal size bounding boxes, then make a cut line after each detected vertical
bars.
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
17
Figure 12. Fragmentation of two side bar touching characters
3. Category 3: Non Bar
If character without Vertical Box then make cut at the right of Bounding Box.
Figure 13. Fragmentation of two non bar touching characters.
3. RESULTS AND ANALYSIS
Different document consists of words are given as input to proposed system. Average result for
touching character segmentation of Devanagari words is 71%. Accurate character segmentation
having side bar is 76%, middle bar 81%, no bar 75%.
3.1. Result of Touching Character Fragmentation
We consider four documents where various characters from words are touched each other in
different way. Some can overlapped with next character, touched at the different position. Result
of touching character segmentation shown in table 1 consists of words are given as input to
proposed system. As we segments characters from words based on different categories (middle
bar, side bar, non-bar). So Table 2 shows Segmentation result of different character category.
Table 1. Fragmentation result of touching characters
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
18
Table 2. Segmentation Result of Different Category Character
Images Characters
Category
Total No of characters
segmented correctly
% Result
a. Sidebar 25 16 64%
Middle bar 03 03 100%
No bar 08 07 87%
b Sidebar 25 17 68%
Middle bar 01 01 100%
No bar 13 09 69%
c Sidebar 10 10 100%
Middle bar 02 01 50%
No bar 02 02 100%
d Sidebar 19 15 78%
Middle bar - - -
No bar 05 03 60%
e Sidebar 13 08 61%
Middle bar 04 01 25%
No bar 04 03 75%
f Sidebar 13 11 84%
Middle bar 05 03 50%
No bar 05 04 80%
3.2. Result of Handwritten and Printed Document
We tested our system on handwritten and printed document without touching characters.
Handwritten Document
Figure 14. Handwritten Document without touching characters
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
19
Figure 15. Character Fragmentation of Handwritten document
Printed Document
Figure 16. Printed Document without touching character
Figure 17. character Fragmentation of printed document
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
20
Table 3. Character Fragmentation Result of Handwritten and Printed Documents
Imags Without
Touching
Characters
No of Characters
in Middle Zone
including Matra
No of Character
Correctly
Fragmented
% Result of
Correctly
Fragmented
Character
Handwritten
document
43 40 93%
Printed document 43 43 100%
3.3.Failure Reason for Characters Segmentation
There are some characters and writing style which degrade the performance of our system. Table
4 gives reasons for failure of some characters
Table 4. Reason for Failure of Some Characters Segmentation
4. CONCLUSION
We are successfully able to find joint points between characters of a word which makes us easy to
find vertical lines and horizontal lines of characters. We tried to do segmentation without skew
correction. Average result of handwritten touching characters from words is 71%. Accurate
character
segmentation having side bar is 76%, middle bar 81%, no bar 75%. For non-touching printed
documents and handwritten document it gives us 100% and 93%result respectively.This method
fails for certain characters because of constraints that we decide will not match always to a lot of
variation in handwriting i.e. , height of two vertical bar characters i.e.Characters like causes
problem as gap between parts of characters that can be solved by the recognition system because
our system consider first part as non-bar character and cut after that. As the future work i s conc e
rn, we pass these separated characters as input to the character recognition system. Again we have
detected baseline, header line of word which will again useful for the recognition system to detect
lower modifier and upper modifier. The character separation technique explained above can be
applied on other Indian scripts also.
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014
21
ACKNOWLEDGEMENTS
I sincerely feel that the credit of project work could not be narrowed down to only one
individual.This work is an integrated effort of all those concern with it, through whose able
cooperation and effective guidance I could achieve its completion.I express my gratitude to my
guide Assistant Prof. Vivek Verma, our co-ordinator Assistant Prof. Vandna Verma and
HOD Assistant Prof. Rakesh Sharma for being kind enough to spare his valuable time, provide
insights and guidance on a complex topic.I am also thankful to Principal and management of
RCEW for their support and encouragement.I also thank computer laboratory staff for their
valuable support
REFERENCES
[1] Y. Lu, “Machine Printed Character Segmentation – an Overview”,Pattern Recognition, vol. 28, No. 1,
pp. 67-80, 1995.
[2] R. G. Casey, E. Lecolinet, “A Survey of Methods and Strategies in Character Segmentation”, IEEE,
vol. 18 No. 7, pp. 690-706, July 1996.
[3] C. E. Dunn, P. S. P. Wang, “Character Segmentation Techniques for Handwritten Text-A Survey”,
Proc. 11th Int.Conf.on Recognition Methodology and Systems, vol. II, pp.577-580, 30 Aug.-3
Sept.1992.
[4] Utpal Garain and Bidyut B. Chaudhuri, “Segmentation of Touching Characters in Printed Devnagari
and Bangla Scripts Using Fuzzy Multifactorial Analysis”, IEEE transactions on systems, man, and
cybernetics—part c: applications and reviews, vol.32, 4, November 2002, 449-459.
[5] Veena Bansal and R. M. K. Sinha,”Integrating Knowledge Sources in Devanagari Text recognition
System”, IEEE transactions on systems, man, and cybernetics—part a: systems and humans, vol.30,
no. 4, July 2000, 500-505.
[6] M. Hanmandlu, Pooja Agrawal, “A Structural Approach for Segmentation of Handwritten Hindi
Text”, Proceeding of the International Conference on Cognition and Recognition, Mandya,
Karnataka, India, 22-23 December 2005, 589-597.
[7] Veena Bansal and R. M. K. Sinha,”Segmentation of Touching and fused Devanagari Characters”,
Pattern recognition, vol. 35:2002, 875-893.
Authors
She received her B.E degree from J.E.C.R.C,
Jaipur(Rajasthan University) in 2005.She is
pursuing her M.Tech in Computer Science from
R.C.E.W(Rajasthan Technical University). She
has presented and published papers in 4
conferences. Her areas of interest are image
processing and networking.
He is an Assistant Professor at Rajasthan
College of Engg For Women. He received his
B.Tech degree from Rajasthan Technical
University. He did his M.Tech in Computer
Science from C-DAC Noida (GGSIPU Delhi).
He has several International and National
publication in SCI journals.

More Related Content

What's hot

Optical Character Recognition IIT BHU
Optical Character Recognition IIT BHUOptical Character Recognition IIT BHU
Optical Character Recognition IIT BHUNitin Vishwari
 
Performance Comparison between Different Feature Extraction Techniques with S...
Performance Comparison between Different Feature Extraction Techniques with S...Performance Comparison between Different Feature Extraction Techniques with S...
Performance Comparison between Different Feature Extraction Techniques with S...IJERA Editor
 
1.p13 078 marathi handwritten
1.p13 078 marathi handwritten1.p13 078 marathi handwritten
1.p13 078 marathi handwrittenTulashiram Pisal
 
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
IRJET-  	  Gesture Recognition for Indian Sign Language using HOG and SVMIRJET-  	  Gesture Recognition for Indian Sign Language using HOG and SVM
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVMIRJET Journal
 
Ijartes v1-i2-001
Ijartes v1-i2-001Ijartes v1-i2-001
Ijartes v1-i2-001IJARTES
 
Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)IJERA Editor
 
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...ijma
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
 
character recognition: Scope and challenges
 character recognition: Scope and challenges character recognition: Scope and challenges
character recognition: Scope and challengesVikas Dongre
 
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...IJCSEA Journal
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognitioniosrjce
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition inijaia
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta FunctionPattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta FunctionEditor IJCATR
 
Classification and Identification of Telugu Aksharas using Moment Invariants ...
Classification and Identification of Telugu Aksharas using Moment Invariants ...Classification and Identification of Telugu Aksharas using Moment Invariants ...
Classification and Identification of Telugu Aksharas using Moment Invariants ...Srikanth Chintakindi
 
A New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian ScriptsA New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian ScriptsCSCJournals
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionNaiyan Noor
 

What's hot (18)

Optical Character Recognition IIT BHU
Optical Character Recognition IIT BHUOptical Character Recognition IIT BHU
Optical Character Recognition IIT BHU
 
Performance Comparison between Different Feature Extraction Techniques with S...
Performance Comparison between Different Feature Extraction Techniques with S...Performance Comparison between Different Feature Extraction Techniques with S...
Performance Comparison between Different Feature Extraction Techniques with S...
 
Bh24380384
Bh24380384Bh24380384
Bh24380384
 
1.p13 078 marathi handwritten
1.p13 078 marathi handwritten1.p13 078 marathi handwritten
1.p13 078 marathi handwritten
 
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
IRJET-  	  Gesture Recognition for Indian Sign Language using HOG and SVMIRJET-  	  Gesture Recognition for Indian Sign Language using HOG and SVM
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
 
Ijartes v1-i2-001
Ijartes v1-i2-001Ijartes v1-i2-001
Ijartes v1-i2-001
 
Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)
 
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
 
character recognition: Scope and challenges
 character recognition: Scope and challenges character recognition: Scope and challenges
character recognition: Scope and challenges
 
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognition
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition in
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta FunctionPattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
 
Classification and Identification of Telugu Aksharas using Moment Invariants ...
Classification and Identification of Telugu Aksharas using Moment Invariants ...Classification and Identification of Telugu Aksharas using Moment Invariants ...
Classification and Identification of Telugu Aksharas using Moment Invariants ...
 
A New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian ScriptsA New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian Scripts
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer Version
 

Viewers also liked (14)

Laporan Metode Statistikia II
Laporan Metode Statistikia IILaporan Metode Statistikia II
Laporan Metode Statistikia II
 
Cambridge Beamer
Cambridge BeamerCambridge Beamer
Cambridge Beamer
 
Resume Film Red Cliff
Resume Film Red CliffResume Film Red Cliff
Resume Film Red Cliff
 
Special olympics
Special olympicsSpecial olympics
Special olympics
 
Uno
UnoUno
Uno
 
Charter bt
Charter btCharter bt
Charter bt
 
Iwona condensed
Iwona condensedIwona condensed
Iwona condensed
 
Szent istván első magyar király a királyok
Szent istván első magyar király a királyokSzent istván első magyar király a királyok
Szent istván első magyar király a királyok
 
Leading with Style-Understanding Your Members
Leading with Style-Understanding Your MembersLeading with Style-Understanding Your Members
Leading with Style-Understanding Your Members
 
Btsa (1)
Btsa (1)Btsa (1)
Btsa (1)
 
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDEDFACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED
 
Hackd Beamer
Hackd BeamerHackd Beamer
Hackd Beamer
 
City of the future sudakov andrey
City of the future  sudakov andreyCity of the future  sudakov andrey
City of the future sudakov andrey
 
Memo seroja
Memo serojaMemo seroja
Memo seroja
 

Similar to Fragmentation of handwritten touching characters in devanagari script

Devnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approachDevnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approachVikas Dongre
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
 
Segmentation of Handwritten Text in Gurmukhi Script
Segmentation of Handwritten Text in Gurmukhi ScriptSegmentation of Handwritten Text in Gurmukhi Script
Segmentation of Handwritten Text in Gurmukhi ScriptCSCJournals
 
Critical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting RecognitionCritical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting RecognitionAmila Wijayarathna
 
Hand-written Hindi Word Recognition - A Comprehensive Survey
Hand-written Hindi Word Recognition - A Comprehensive SurveyHand-written Hindi Word Recognition - A Comprehensive Survey
Hand-written Hindi Word Recognition - A Comprehensive SurveyIRJET Journal
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Networkijceronline
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognitionVikas Dongre
 
Bengali Numeric Number Recognition
Bengali Numeric Number RecognitionBengali Numeric Number Recognition
Bengali Numeric Number RecognitionAmitava Choudhury
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Imagessipij
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Imagessipij
 
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTSEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTcscpconf
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
Recognition of Offline Handwritten Hindi Text Using SVM
Recognition of Offline Handwritten Hindi Text Using SVMRecognition of Offline Handwritten Hindi Text Using SVM
Recognition of Offline Handwritten Hindi Text Using SVMCSCJournals
 
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...ijcseit
 
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...ijcseit
 

Similar to Fragmentation of handwritten touching characters in devanagari script (20)

Devnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approachDevnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approach
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
Ijetcas14 399
Ijetcas14 399Ijetcas14 399
Ijetcas14 399
 
Segmentation of Handwritten Text in Gurmukhi Script
Segmentation of Handwritten Text in Gurmukhi ScriptSegmentation of Handwritten Text in Gurmukhi Script
Segmentation of Handwritten Text in Gurmukhi Script
 
L017116064
L017116064L017116064
L017116064
 
Critical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting RecognitionCritical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting Recognition
 
Hand-written Hindi Word Recognition - A Comprehensive Survey
Hand-written Hindi Word Recognition - A Comprehensive SurveyHand-written Hindi Word Recognition - A Comprehensive Survey
Hand-written Hindi Word Recognition - A Comprehensive Survey
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
 
Isolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code FeaturesIsolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code Features
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognition
 
Bengali Numeric Number Recognition
Bengali Numeric Number RecognitionBengali Numeric Number Recognition
Bengali Numeric Number Recognition
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Images
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Images
 
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTSEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Recognition of Offline Handwritten Hindi Text Using SVM
Recognition of Offline Handwritten Hindi Text Using SVMRecognition of Offline Handwritten Hindi Text Using SVM
Recognition of Offline Handwritten Hindi Text Using SVM
 
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
 
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
STRUCTURAL FEATURES FOR RECOGNITION OF HAND WRITTEN KANNADA CHARACTER BASED O...
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Fragmentation of handwritten touching characters in devanagari script

  • 1. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No. 1, February 2014 DOI : 10.5121/ijitmc.2014.2102 11 FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT Shuchi Kapoor1 and Vivek Verma2 1,2 Department of Computer Engineering, Rajasthan College of Engineering for Women, Jaipur, Rajasthan ABSTRACT Character Segmentation of handwritten words is a difficult task because of different writing styles and complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The occurrence of header line, overlapped characters in middle zone & half characters make the segmentation process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task. Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the touching characters in a word. So, we devised a technique, according to that first step will be preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical & horizontal lines and finally fragment the touching characters on the basis of their height and width. KEYWORDS Fragmentation, Joint point algorithm, Shirorekha, side bar, middle bar . 1. INTRODUCTION After many years of research, hand written touching character segmentation remains a challenging task. Touching character patterns emerge when two adjacent characters are written too close, therefore, some parts of character are connected horizontally/left-right. In printed script generally two consecutive characters do not overlap or touch but in handwritten everyone is having their own writing style. In this present work we have proposed a method of segmenting characters from handwritten Devanagari words. There is unpredictable variation in handwriting style from person to person and Devanagari having large number of characters set makes segmentation of handwritten words complex. Papers [1,2] give some surveys on segmentation techniques for machine printed text. For segmentation of handwritten text also a survey paper [3] is available. Very few papers are available on segmentation of handwritten characters from word. No work for touching handwritten Devanagari character segmentation is found so far to the best of our knowledge. In [4], Utpal Garain and Bidyut B. Chaudhuri, presents a new technique for identification and segmentation of machine printed touching characters. They devised a fuzzy multifactorial analysis technique which has been applied to printed documents in Devanagari and Bangla. In [5], Veena Bansal and R.M.K.Sinha depict the system architecture and its components for Devanagari text recognition system. The system consists of a solution blackboard and various knowledge sources. In [6], M. Hanmandlu and Pooja Agrawal made an attempt to segment the handwritten Hindi words. The segmentation problem is computed due to different writing styles and presence of modifiers on all sides of characters. So, they proposed a structural approach to identify the similarities and differences between structure classes. Veena Bansal and R.M. K. Sinha [7], presents an algorithm for segmentation of machine printed touching Devanagari
  • 2. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 12 characters (also referred to as conjuncts) into its constituent symbols and characters. In the Existing system the selective algorithm is developed for the identification and removal of header line and for further segmentation from the handwritten Devanagari characters. Proposed algorithm extensively uses structural properties of the script. It uses vertical projection, continuity of collapsed horizontal projection. We use structural properties of Devanagari script for fragmentation of characters. 1.1. Characteristics of Devanagari Script Devanagari Script is the most common script in India which include Marathi, Hindi, Sanskrit & Nepali. It has 12 vowels and 35 consonants as shown in figure.1. Vowels can be written isolated or by using variety of modifiers overhead, down, foreword or backward after the consonants they belong to & those characters are called conjuncts as shown in figure 2. At times new shapes are formed after combining two or more consonants & these new shapes are called compound characters as shown in figure 3. In Devanagari script there is a header line known as Shirorekha, this feature distinguish Devanagari script from English, so English will be extracted from these script. There are four imaginary lines that drawn for a word, Headline is s same as header line, Baseline where characters completed but excluding below modifiers. After below modifiers line drawn called Lowerline and the line above Headline after modifiers are called Upperline resultant text is partitioned into 3 zones- Upper zone between Upperline and Headline, Middle zone between Headline and Baseline & last is Lower zone between Baseline and Lowerline. A typical zoning is shown in Figure.4. In Hindi we have three types of characters as follows also shown in Figure 5a,5b, 5c 1.END-BAR Characters 2.MIDDLE-BAR Characters 3.NO-BAR Characters Figure 1. Consonants.
  • 3. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 13 Figure 2. Modifiers(Ascenders and descenders) Figure 3. Compound Characters Figure 4. Different Zones of Devanagari Text. Figure 5a. END-BAR Characters Figure 5b. MIDDLE-BAR Characters Figure 5c. NO-BAR Characters As we are considering handwritten Devanagari, it again adds up complexity because while writing, characters of the word may touch each other at different position as shown in figure 6. For recognition, it is necessary to segment these touching characters of words. For this we propose a system. Figure 6. Samples of Touched Characters
  • 4. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 14 2.RESEARCH METHOD In Devanagari script few categories are classified on the basis of structural properties that are given in figure 5. Proposed System 2.1. Pre-processing: 1. In Image Acquisition scan a document and storing it as an image. This will act as input to the program. Higher resolution produces better results. 2. In every binary image, each pixel is in form of two value: 1 or 0. The input image is first converted into binary or image will be complemented means background is black and foreground i.e words are white and represented as 1’s. In binary image attached words are found and isolate each word separately. 3. In binarized image all small black patches are removed because that spots can create confusion while thinning process. Bounding boxes are formed around small black spots and converted into white. Protruding points 1. Protruding points are those unnecessary joint points that generate after thinning process. So, these protruding points are removed so that actual joint points can be generated between those actual line exists. 2. In 3x3 matrix sum will be calculated, around each pixel, minus centre pixel. If this sum is less than or equal to 3 then the protruding pixel is removed. This process is visualized in the figure. 3. Image with protruding points highlighted with red.
  • 5. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 15 Figure 7. For eliminating unusable joint points in the image protruding points are removed 2.2. Word Processing Stage 2.2.1 The joint points (i) Where two lines meet that point is called joint points. The lines can be straight or curved. On the basis of these points characters are dissected but here we are using joint points just to form the vertical bars and header lines. (ii) By analyzing the design of pixels in 3x3 matrix adjacent to each pixel joint points are found. (iii) With the help of joint points we will find vertical bars and header lines. (iv) In 3 x 3 matrix sum will be calculated around each pixel minus centre pixel and the pixel is considered as a joint point if sum is equal to more than 3. The result 3 means three lines meeting at that point. (v) If pixel is having only one line joining the point i.e only 1 pixel in the surrounding is called terminating points. Figure 8. Joint points (shown in green) 2.2.2 Form Bounding Box To fragment each character in a word bounding boxes are formed in order to identify the header line and vertical bars. Within each word horizontal rectangles are identified. In thinned image bounding boxes are formed after removing the joint points. 2.2.3 Removal of Shirorekha (i) The first aim is to find and remove shirorekha. (ii) In thinned image bounding boxes are obtained leaving joint points. (iii) Horizontal bounding boxes are recognized in every word. (iv) Horizontal rectangles are form on the basis of width-height ratio i.e should be more than or equal to 3:1. (v) Heights of all pixels inside these rectangles are averaged. (vi) If maximum number of pixels on one side of this averaged line than that rectangle are considered as shirorekha. (v) These rectangles are removed to remove the shirorekha.
  • 6. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 16 Figure 9. Horizontal Bounding Boxes 2.2.4 The vertical bar detection (i) Vertical rectangles will help to detect the vertical bars. (ii) Vertical rectangles are having width-height ratio less than or equal to 1:3. (iii) To identify the vertical bars percentage is calculated on the basis of how much percentage of white pixels (parts of vertical bars) is coming in vertical rectangles and if it is more than 60% then the rectangle holds vertical bar. (iv) Rectangles are stretched upto the shirorekha in the top and at the bottom & will get a new image. (v) New image contains bounding boxes. Figure 10. Vertical bars are detected 2.3. Character Processing Stage Shirorekha and vertical bar removed image is considered for character processing. 1. Category 1: Mid Bar Form a bounding box and check width of bounding box. For category 1, if width of bounding box is greater than single character width (means two characters are touching) then make a cut at leftmost joint point of large bounding box. We get separated characters. We get separated characters. Figure 11. Example of category 1 for cut line finding. 2. Category 2: Side Bar When two side bar characters are touching each other, form bounding box. If there is no large bounding box and no equal size bounding boxes, then make a cut line after each detected vertical bars.
  • 7. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 17 Figure 12. Fragmentation of two side bar touching characters 3. Category 3: Non Bar If character without Vertical Box then make cut at the right of Bounding Box. Figure 13. Fragmentation of two non bar touching characters. 3. RESULTS AND ANALYSIS Different document consists of words are given as input to proposed system. Average result for touching character segmentation of Devanagari words is 71%. Accurate character segmentation having side bar is 76%, middle bar 81%, no bar 75%. 3.1. Result of Touching Character Fragmentation We consider four documents where various characters from words are touched each other in different way. Some can overlapped with next character, touched at the different position. Result of touching character segmentation shown in table 1 consists of words are given as input to proposed system. As we segments characters from words based on different categories (middle bar, side bar, non-bar). So Table 2 shows Segmentation result of different character category. Table 1. Fragmentation result of touching characters
  • 8. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 18 Table 2. Segmentation Result of Different Category Character Images Characters Category Total No of characters segmented correctly % Result a. Sidebar 25 16 64% Middle bar 03 03 100% No bar 08 07 87% b Sidebar 25 17 68% Middle bar 01 01 100% No bar 13 09 69% c Sidebar 10 10 100% Middle bar 02 01 50% No bar 02 02 100% d Sidebar 19 15 78% Middle bar - - - No bar 05 03 60% e Sidebar 13 08 61% Middle bar 04 01 25% No bar 04 03 75% f Sidebar 13 11 84% Middle bar 05 03 50% No bar 05 04 80% 3.2. Result of Handwritten and Printed Document We tested our system on handwritten and printed document without touching characters. Handwritten Document Figure 14. Handwritten Document without touching characters
  • 9. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 19 Figure 15. Character Fragmentation of Handwritten document Printed Document Figure 16. Printed Document without touching character Figure 17. character Fragmentation of printed document
  • 10. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 20 Table 3. Character Fragmentation Result of Handwritten and Printed Documents Imags Without Touching Characters No of Characters in Middle Zone including Matra No of Character Correctly Fragmented % Result of Correctly Fragmented Character Handwritten document 43 40 93% Printed document 43 43 100% 3.3.Failure Reason for Characters Segmentation There are some characters and writing style which degrade the performance of our system. Table 4 gives reasons for failure of some characters Table 4. Reason for Failure of Some Characters Segmentation 4. CONCLUSION We are successfully able to find joint points between characters of a word which makes us easy to find vertical lines and horizontal lines of characters. We tried to do segmentation without skew correction. Average result of handwritten touching characters from words is 71%. Accurate character segmentation having side bar is 76%, middle bar 81%, no bar 75%. For non-touching printed documents and handwritten document it gives us 100% and 93%result respectively.This method fails for certain characters because of constraints that we decide will not match always to a lot of variation in handwriting i.e. , height of two vertical bar characters i.e.Characters like causes problem as gap between parts of characters that can be solved by the recognition system because our system consider first part as non-bar character and cut after that. As the future work i s conc e rn, we pass these separated characters as input to the character recognition system. Again we have detected baseline, header line of word which will again useful for the recognition system to detect lower modifier and upper modifier. The character separation technique explained above can be applied on other Indian scripts also.
  • 11. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.2, No.1, February 2014 21 ACKNOWLEDGEMENTS I sincerely feel that the credit of project work could not be narrowed down to only one individual.This work is an integrated effort of all those concern with it, through whose able cooperation and effective guidance I could achieve its completion.I express my gratitude to my guide Assistant Prof. Vivek Verma, our co-ordinator Assistant Prof. Vandna Verma and HOD Assistant Prof. Rakesh Sharma for being kind enough to spare his valuable time, provide insights and guidance on a complex topic.I am also thankful to Principal and management of RCEW for their support and encouragement.I also thank computer laboratory staff for their valuable support REFERENCES [1] Y. Lu, “Machine Printed Character Segmentation – an Overview”,Pattern Recognition, vol. 28, No. 1, pp. 67-80, 1995. [2] R. G. Casey, E. Lecolinet, “A Survey of Methods and Strategies in Character Segmentation”, IEEE, vol. 18 No. 7, pp. 690-706, July 1996. [3] C. E. Dunn, P. S. P. Wang, “Character Segmentation Techniques for Handwritten Text-A Survey”, Proc. 11th Int.Conf.on Recognition Methodology and Systems, vol. II, pp.577-580, 30 Aug.-3 Sept.1992. [4] Utpal Garain and Bidyut B. Chaudhuri, “Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts Using Fuzzy Multifactorial Analysis”, IEEE transactions on systems, man, and cybernetics—part c: applications and reviews, vol.32, 4, November 2002, 449-459. [5] Veena Bansal and R. M. K. Sinha,”Integrating Knowledge Sources in Devanagari Text recognition System”, IEEE transactions on systems, man, and cybernetics—part a: systems and humans, vol.30, no. 4, July 2000, 500-505. [6] M. Hanmandlu, Pooja Agrawal, “A Structural Approach for Segmentation of Handwritten Hindi Text”, Proceeding of the International Conference on Cognition and Recognition, Mandya, Karnataka, India, 22-23 December 2005, 589-597. [7] Veena Bansal and R. M. K. Sinha,”Segmentation of Touching and fused Devanagari Characters”, Pattern recognition, vol. 35:2002, 875-893. Authors She received her B.E degree from J.E.C.R.C, Jaipur(Rajasthan University) in 2005.She is pursuing her M.Tech in Computer Science from R.C.E.W(Rajasthan Technical University). She has presented and published papers in 4 conferences. Her areas of interest are image processing and networking. He is an Assistant Professor at Rajasthan College of Engg For Women. He received his B.Tech degree from Rajasthan Technical University. He did his M.Tech in Computer Science from C-DAC Noida (GGSIPU Delhi). He has several International and National publication in SCI journals.