SlideShare a Scribd company logo
1 of 8
Download to read offline
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
DOI : 10.5121/ijcseit.2011.1305 46
DEVNAGARI DOCUMENT SEGMENTATION USING
HISTOGRAM APPROACH
Vikas J Dongre 1
Vijay H Mankar 2
Department of Electronics & Telecommunication,
Government Polytechnic, Nagpur, India
1
dongrevj@yahoo.co.in;2
vhmankar@gmail.com
ABSTRACT
Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
KEYWORDS
Devnagari Character Recognition, paragraph segmentation, Line segmentation, Word segmentation,
Machine learning.
1. INTRODUCTION
Machine learning and human computer interaction are the most challenging research fields since the
evolution of digital computers. In Optical Character Recognition (OCR), the text lines, words and
symbols in a document must be segmented properly before recognition. Correctness/ incorrectness
of text line segmentation directly affect accuracy of word/character segmentation and consequently
affect the accuracy of word/character recognition [1]. Several techniques for text line segmentation
are reported in the literature [2-6]. These techniques may be classified into three groups as follows:
(i) Projection profile based techniques, (ii) Hough transform based techniques, (iii) Thinning based
approach. As a conventional technique for text line segmentation, global horizontal projection
analysis of black pixels has been utilized in [4, 7]. Piece-wise horizontal projection analysis of black
pixels is employed by many researchers to segment text pages of different languages [2, 9]. In
piecewise horizontal projection technique, the text-page image is decomposed into horizontal
stripes. The positions of potential piece-wise separating lines are obtained for each stripe using
horizontal projection on each stripe. The potential separating lines are then connected to achieve
complete separating lines for all respective text lines located in the text page image. Concept of the
Hough transform is employed in the field of document analysis in many research areas such as skew
detection, slant detection, text line segmentation, etc [8]. Thinning operation is also used by
researchers for text line segmentation from documents [10].
In this paper we have proposed a bounded box method for segmentation of documents lines and
words and characters. The method is based on the pixel histogram obtained. The organization of this
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
47
paper is as follows: In Section 2, we have discussed features of Indian scripts. Section 3 discusses
image preprocessing methods. Section 4 details the proposed segmentation approach. Experimental
results are discussed in Section 5 and scope for further research is discussed in Section 6.
2. FEATURES OF DEVNAGARI SCRIPT
India is a multi-lingual and multi-script country comprising of eighteen official languages. Because
there is typically a letter for each of the phonemes in Indian languages, the alphabet set tends to be
quite large. Hindi, the national language of India, is written in the Devnagari script. Devnagari is
also used for writing Marathi, Sanskrit and Nepali. Moreover, Hindi is the third most popular
language in the world [1]. It is spoken by more than 500 million people in the world. Devnagari has
11 vowels and 33 consonants. They are called basic characters. Vowels can be written as
independent letters, or by using a variety of diacritical marks which are written above, below,
before or after the consonant they belong to. When vowels are written in this way they are known as
modifiers and the characters so formed are called conjuncts. Sometimes two or more consonants can
combine and take new shapes. These new shaped clusters are known as compound characters.
These types of basic characters, compound characters and modifiers are present not only in
Devnagari but also in other scripts.
All the characters have a horizontal line at the upper part, known as Shirorekha. In continuous
handwriting, from left to right direction, the shirorekha of one character joins with the shirorekha of
the previous or next character of the same word. In this fashion, multiple characters and modified
shapes in a word appear as a single connected component joined through the common shirorekha.
Also in Devnagari there are vowels, consonants, vowel modifiers and compound characters,
numerals. Moreover, there are many similar shaped characters. All these variations make Devnagari
Optical Character Recognition, a challenging problem. A sample of Devnagari character set is
provided in table 1 to 6.
Table 1: Vowels and Corresponding
Modifiers
Table 2: Consonants
Table 3: Half Form of Consonants with
Vertical Bar.
Table 4: Examples of Combination of Half-
Consonant and Consonant.
Table 5: Examples of Special Combination
of Half-Consonant and Consonant.
Table 6: Special Symbols
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
48
3. IMAGE PREPROCESSING
We have collected the printed pages from different office correspondence. The document pages are
scanned using a flat bed scanner at a resolution of 300 dpi. These pixels may have values: OFF (0)
or ON (1) for binary images, 0–255 for gray-scale images, and 3 channels of 0–255 colour values
for colour images. Colour image is converted to grayscale by eliminating the hue and saturation
information while retaining the luminance. It is further analyzed to get useful information. Such
processing is explained below.
3.1 Thresholding and Binarization:
The digitized text images are converted into binary images by thresholding using Otsu’s method
[17]. Original image contains 0 for Object and 1 for background. The image inverted to obtain
image such that object pixels are represented by 1 and background pixels by 0.
3.2 Noise reduction:
The noise, introduced by the optical scanning device or the writing instrument, causes disconnected
line segments, bumps and gaps in lines, filled loops etc. The distortion including local variations,
rounding of corners, dilation and erosion is also a problem. Prior to the character recognition, it is
necessary to eliminate these imperfections [11-12]. It is carried using various morphological
processing techniques.
3.3 Skew Detection and Correction:
Handwritten document may originally be skewed or skewness may introduce in document scanning
process. This effect is unintentional in many real cases, and it should be eliminated because it
dramatically reduces the accuracy of the subsequent processes such as segmentation and
classification. Skewed lines are made horizontal by calculating skew angle and making proper
correction in the raw image using Hu moments and various transforms [13-15].
Figure 1: Preprocessed Images (a) Original, (b) segmented (c) Shirorekha removed (d) Thinned
(e) image edging
3.4 Thinning:
The boundary detection of image is done to enable easier subsequent detection of pertinent features
and objects of interest (see fig.1- a to e). Various standard functions are available in MATLAB for
above operations [16].
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
49
4. PROPOSED SEGMENTATION APPROACH
After the image is preprocessed using methods discussed in section 3, we now apply various
techniques for segmentation of document lines, words and characters. The process of segmentation
mainly follows the following pattern:
1) Identify the text lines in the page.
2) Identify the words in individual line.
3) Finally identify individual character in each word.
4.1 Line Segmentation.
The global horizontal projection method is used to compute sum of all white pixels on every row
and construct corresponding histogram. The steps for line segmentation are as follow:
• Construct the Horizontal Histogram for the image (fig. 2-b).
• Count the white pixel in each row.
• Using the Histogram, find the rows containing no white pixel.
• Replace all such rows by 1 (fig. 2-c).
• Invert the image to make empty rows as 0 and text lines will have original pixels.
• Mark the Bounding Box for text lines (figure 2-e) using standard Matlab functions (regionprops
and rectangle).
• Copy the pixels in Bounding Box and save in separate file. (Separated lines shown in fig. 2-f).
a) Original Scanned Document (b) Image Histogram
(c) Blank space between the lines (d) Line separation
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
50
(e) Regions of interest (f) segmented lines
Figure 2: Line Segmentation
4.2 Word Segmentation
The global horizontal projection method is used here to compute sum of all white pixels on every
column and construct corresponding histogram. The steps for line segmentation are as follow:
• Construct the Vertical Histogram for the image (fig. 3-b).
• Count the white pixel in each column.
• Using the Histogram, find the columns containing no white pixel.
• Replace all such columns by 1
• Invert the image to make empty rows as 0 and text words will have original pixels.
• Mark the Bounding Box for word. (See fig 3-c)
• Copy the pixels in the Bounding Box and save in separate file. (See fig. 3-d).
(a) Original line
(b) Word Histogram
(c) Regions of interest
(d) Segmented words
Figure 3: Word Segmentation
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
51
4.3 Character Segmentation
A slight modification in previous algorithm (section 4.2) is used here. The steps for line
segmentation are as follow:
• Get the thinned image using Matlab bwmorph function. (This is done to normalize image
against thickness of the character).
• Count the white pixel in each column.
• Find the position containing single white pixel.
• Replace all such columns by 1.
• Invert the image to make such columns as 0 and text characters will have original pixels.
• Mark the Bounding Box for characters using standard Matlab functions. See fig 4-a.
• Copy the pixels in the Bounding Box and save in separate file. (Separated characters are shown
in fig. 4-b).
(a) Region of Interest (b) segmented characters
Figure 4: Character segmentation
5. RESULTS AND DISCUSSION
Various documents were collected and tested. It is observed that line segmentation is done with
nearly 100% accuracy. Word segmentation is accurate as long as the document contains
characters only. When Devnagari numerals are present in the document, which does not contain
shirorekha, each digit is considered as separate word by the algorithm. Hence accuracy is
reduced marginally. In the present case it is 91%.
Table 7: Character Segmentation results for document in fig 2 (a)
Words ( in figure 4) 1 2 3 4 5 6
Characters present 3 3 6 3 2 2
Characters recognized 5 5 12 7 6 6
Accuracy 60 % 60 % 50 % 42 % 33 % 33 %
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
52
Table 8: Overall Segmentation results for document in fig 2 (a)
Line Segmentation
Lines in Document Recognized lines Accuracy
8 8 100 %
Word Segmentation
words in Document Recognized words Accuracy
41 45 91 %
Line Segmentation
Characters in Document Recognized Characters Accuracy
133 242 55 %
In case of character segmentation, words are segmented into more symbols than actually present
in the word as shown in figure 4. Result is summarized in Table 8. This error is resulted since
the words are scanned only from top to bottom by the algorithm used. Devnagari is two
dimensional script as consonants are modified in many ways from top, bottom, left or right to
form a meaningful letter. Unconnected Vertical lines in the words are recognized as separate
symbol by the algorithm used. For accurate segmentation, all the modifiers must be segmented
so that their recognition can be properly done done.
6. CONCLUSIONS AND FUTURE WORK
In this paper, we have presented a primary work for segmentation of lines, words and characters
of Devnagari script. Nearly 100% successful segmentation achieved in line and word
segmentation but character level segmentation needs more effort as it is complicated for
Devnagari script. This is challenging work due to following reasons.
• Compound letters are connected at various places. It is difficult to identify exact connecting
points for segmentation.
• Upper and lower modifier segmentation needs different approaches.
• Separating anuswara (.) and full stop (.) from noise is critical as both resemble the same.
Knowledge of natural language processing techniques needs to be applied here.
• Handwritten unconnected compound letter segmentation is also critical.
• Handwritten unintentionally connected simple letter segmentation is also critical.
All these issues will be dealt in the future for printed and handwritten documents in Devnagari
script by using various approaches.
REFERENCES
[1] Nallapareddy Priyanka, Srikanta Pal, Ranju Mandal, (2010) “Line and Word Segmentation
Approach for Printed Documents”, IJCA Special Issue on Recent Trends in Image Processing and
Pattern Recognition-RTIPPR, pp 30-36.
[2] K. Wong, R. Casey and F. Wahl, (1982) “Document Analysis System”, IBM j. Res. Dev., 26(6), pp.
647-656.
[3] G. Nagy, S. Seth, and M. Viswanathan, (1992) “A prototype document image analysis system for
technical journals”, Computer, vol. 25, pp. 10-22.
[4] Vijay Kumar, Pankaj K.Senegar, (2010) “Segmentation of Printed Text in Devnagari Script and
Gurmukhi Script”, IJCA: International Journal of Computer Applications, Vol.3,pp. 24-29.
[5] U. Pal and Sagarika Datta, (2003) “Segmentation of Bangla Unconstrained Handwritten Text”, Proc.
7th Int. Conf. on Document analysis and Recognition, pp. 1128-113.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011
53
[6] Vikas J Dongre, Vijay H Mankar, (July 2011) “Segmentation of Devnagari Documents”,
Communications in Computer and Information Science, 2011, Volume 198, Part 1, Springer
proceedings, 1st International conference, ACITY Chennai, India, pp 211-218.
[7] Vikas J Dongre, Vijay H Mankar, (2010) “A Review of Research on Devnagari Character
Recognition”, International Journal of Computer Applications (0975 – 8887) Volume 12– No.2, pp.
8-15.
[8] U. Pal, M. Mitra, and B. B. Chaudhuri, (2001) “Multi-skew detection of Indian script documents”,
Proc. 6th Int. Conf. Document Analysis Recognition, pp. 292-296.
[9] Likforman-Sulem L, Zahour A and Taconet B, (2007) “Text line Segmentation of Historical
Documents: a Survey”, International Journal on Document Analysis and Recognition, Springer,
Vol. 9, Issue 2, pp.123-138.
[10] G. Magy (2000) “Twenty years of Document Analysis in PAMI”, IEEE Trans. in PAMI, Vol.22,
pp. 38-61.
[11] J. Serra, (1994) “Morphological Filtering: An Overview”, Signal Processing, vol. 38, no.1, pp.3-11.
[12] Nafiz Arica, Fatos T. Yarman Vural, (2000) “An Overview of Character Recognition Focused On
Off-line Handwriting”, IEEE C99-06-C-203.
[13] Mohamed Cheriet, Nawwaf Kharma, Cheng-Lin Liu, Ching Y. Suen, (2007) “Character
Recognition Systems: A Guide for students and Practitioners”, John Wiley & Sons, Inc., Hoboken,
New Jersey.
[14] Rajiv Kapoor, Deepak Bagai, T. S. Kamal, (2002) “Skew angle detection of a cursive handwritten
Devnagari script character image”, Journal of Indian Inst. Science, pp. 161–175.
[15] U. Pal, M. Mitra and B. B. Chaudhuri,(2001) “Multi-Skew Detection of Indian Script Documents”,
CVPRU IEEE, pp 292-296.
[16] V. H. Mankar et al, (2010) “Contour Detection and Recovery through Bio-Medical Watermarking
for Telediagnosis”, International Journal of Tomography & Statistics, Vol. 14 (Special Volume),
Number S10.
[17] Guo Jing; Rajan D.; Chng Eng Siong, (2005) “Motion Detection with Adaptive Background and
Dynamic Thresholds”, Fifth International Conference on Information, Communications and Signal
Processing, Bangkok, W B.4, pp 41-45.
Authors
Vikas J Dongre received B.E and M.E. in Electronics in 19991 and 1994 respectively. He
served as lecturer in SSVPS engineering college Dhule, (M.S.) India from 1992 to 1994. He
Joined Government Polytechnic Nagpur as Lecturer in 1994 where he is presently working as
lecturer (selection grade). His areas of interests include Microcontrollers, embedded systems,
image recognition, and innovative Laboratory practices. He is pursuing for PhD in Offline
Handwritten Devnagari Character Recognition. He has published one research paper in
international journal and two research paper in international conferences.
Vijay H. Mankar received M. Tech. degree in Electronics Engineering from VNIT, Nagpur
University, India in 1995 and Ph.D. (Engg) from Jadavpur University, Kolkata, India in 2009
respectively. He has more than 17 years of teaching experience and presently working as a
Lecturer (Selection Grade) in Government Polytechnic, Nagpur (MS), India. He has
published more than 30 research papers in international conference and journals. His field of
interest includes digital image processing, data hiding and watermarking.

More Related Content

What's hot

Fragmentation of handwritten touching characters in devanagari script
Fragmentation of handwritten touching characters in devanagari scriptFragmentation of handwritten touching characters in devanagari script
Fragmentation of handwritten touching characters in devanagari scriptZac Darcy
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 
A Survey Paper on Character Recognition
A Survey Paper on Character RecognitionA Survey Paper on Character Recognition
A Survey Paper on Character Recognitionijsrd.com
 
BrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter ApplicationBrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter Applicationpijush15
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniquesijsrd.com
 
An Application of Eight Connectivity based Two-pass Connected-Component Label...
An Application of Eight Connectivity based Two-pass Connected-Component Label...An Application of Eight Connectivity based Two-pass Connected-Component Label...
An Application of Eight Connectivity based Two-pass Connected-Component Label...CSCJournals
 
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...ijaia
 
A bidirectional text transcription of braille for odia, hindi, telugu and eng...
A bidirectional text transcription of braille for odia, hindi, telugu and eng...A bidirectional text transcription of braille for odia, hindi, telugu and eng...
A bidirectional text transcription of braille for odia, hindi, telugu and eng...eSAT Journals
 
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...ijma
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
Off-Line Arabic Handwritten Words Segmentation using Morphological Operators
Off-Line Arabic Handwritten Words Segmentation using Morphological OperatorsOff-Line Arabic Handwritten Words Segmentation using Morphological Operators
Off-Line Arabic Handwritten Words Segmentation using Morphological Operatorssipij
 
V.karthikeyan published article
V.karthikeyan published articleV.karthikeyan published article
V.karthikeyan published articleKARTHIKEYAN V
 
Bengali Numeric Number Recognition
Bengali Numeric Number RecognitionBengali Numeric Number Recognition
Bengali Numeric Number RecognitionAmitava Choudhury
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Networkijceronline
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONIJNLC Int.Jour on Natural Lang computing
 

What's hot (18)

Fragmentation of handwritten touching characters in devanagari script
Fragmentation of handwritten touching characters in devanagari scriptFragmentation of handwritten touching characters in devanagari script
Fragmentation of handwritten touching characters in devanagari script
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
A Survey Paper on Character Recognition
A Survey Paper on Character RecognitionA Survey Paper on Character Recognition
A Survey Paper on Character Recognition
 
BrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter ApplicationBrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter Application
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniques
 
An Application of Eight Connectivity based Two-pass Connected-Component Label...
An Application of Eight Connectivity based Two-pass Connected-Component Label...An Application of Eight Connectivity based Two-pass Connected-Component Label...
An Application of Eight Connectivity based Two-pass Connected-Component Label...
 
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
 
A bidirectional text transcription of braille for odia, hindi, telugu and eng...
A bidirectional text transcription of braille for odia, hindi, telugu and eng...A bidirectional text transcription of braille for odia, hindi, telugu and eng...
A bidirectional text transcription of braille for odia, hindi, telugu and eng...
 
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Off-Line Arabic Handwritten Words Segmentation using Morphological Operators
Off-Line Arabic Handwritten Words Segmentation using Morphological OperatorsOff-Line Arabic Handwritten Words Segmentation using Morphological Operators
Off-Line Arabic Handwritten Words Segmentation using Morphological Operators
 
V.karthikeyan published article
V.karthikeyan published articleV.karthikeyan published article
V.karthikeyan published article
 
Bengali Numeric Number Recognition
Bengali Numeric Number RecognitionBengali Numeric Number Recognition
Bengali Numeric Number Recognition
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 

Similar to Devnagari Document Segmentation Using Histogram Approach

Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognitionVikas Dongre
 
Fragmentation of Handwritten Touching Characters in Devanagari Script
Fragmentation of Handwritten Touching Characters in Devanagari ScriptFragmentation of Handwritten Touching Characters in Devanagari Script
Fragmentation of Handwritten Touching Characters in Devanagari ScriptZac Darcy
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONkevig
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionijaia
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET Journal
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...CSCJournals
 
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATIONA SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATIONijnlc
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Imagessipij
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Imagessipij
 
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...Mohammad Liton Hossain
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition inijaia
 
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformText Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformIOSR Journals
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
 
Improvement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching charactersImprovement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching characterseSAT Publishing House
 

Similar to Devnagari Document Segmentation Using Histogram Approach (20)

Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognition
 
Fragmentation of Handwritten Touching Characters in Devanagari Script
Fragmentation of Handwritten Touching Characters in Devanagari ScriptFragmentation of Handwritten Touching Characters in Devanagari Script
Fragmentation of Handwritten Touching Characters in Devanagari Script
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognition
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
 
A017240107
A017240107A017240107
A017240107
 
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATIONA SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Images
 
Topographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character ImagesTopographic Feature Extraction for Bengali and Hindi Character Images
Topographic Feature Extraction for Bengali and Hindi Character Images
 
50120130406021
5012013040602150120130406021
50120130406021
 
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
 
F045053236
F045053236F045053236
F045053236
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition in
 
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformText Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
 
Improvement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching charactersImprovement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching characters
 

Recently uploaded

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Devnagari Document Segmentation Using Histogram Approach

  • 1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 DOI : 10.5121/ijcseit.2011.1305 46 DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH Vikas J Dongre 1 Vijay H Mankar 2 Department of Electronics & Telecommunication, Government Polytechnic, Nagpur, India 1 dongrevj@yahoo.co.in;2 vhmankar@gmail.com ABSTRACT Document segmentation is one of the critical phases in machine recognition of any language. Correct segmentation of individual symbols decides the accuracy of character recognition technique. It is used to decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper. Various challenges in segmentation of Devnagari script are also discussed. KEYWORDS Devnagari Character Recognition, paragraph segmentation, Line segmentation, Word segmentation, Machine learning. 1. INTRODUCTION Machine learning and human computer interaction are the most challenging research fields since the evolution of digital computers. In Optical Character Recognition (OCR), the text lines, words and symbols in a document must be segmented properly before recognition. Correctness/ incorrectness of text line segmentation directly affect accuracy of word/character segmentation and consequently affect the accuracy of word/character recognition [1]. Several techniques for text line segmentation are reported in the literature [2-6]. These techniques may be classified into three groups as follows: (i) Projection profile based techniques, (ii) Hough transform based techniques, (iii) Thinning based approach. As a conventional technique for text line segmentation, global horizontal projection analysis of black pixels has been utilized in [4, 7]. Piece-wise horizontal projection analysis of black pixels is employed by many researchers to segment text pages of different languages [2, 9]. In piecewise horizontal projection technique, the text-page image is decomposed into horizontal stripes. The positions of potential piece-wise separating lines are obtained for each stripe using horizontal projection on each stripe. The potential separating lines are then connected to achieve complete separating lines for all respective text lines located in the text page image. Concept of the Hough transform is employed in the field of document analysis in many research areas such as skew detection, slant detection, text line segmentation, etc [8]. Thinning operation is also used by researchers for text line segmentation from documents [10]. In this paper we have proposed a bounded box method for segmentation of documents lines and words and characters. The method is based on the pixel histogram obtained. The organization of this
  • 2. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 47 paper is as follows: In Section 2, we have discussed features of Indian scripts. Section 3 discusses image preprocessing methods. Section 4 details the proposed segmentation approach. Experimental results are discussed in Section 5 and scope for further research is discussed in Section 6. 2. FEATURES OF DEVNAGARI SCRIPT India is a multi-lingual and multi-script country comprising of eighteen official languages. Because there is typically a letter for each of the phonemes in Indian languages, the alphabet set tends to be quite large. Hindi, the national language of India, is written in the Devnagari script. Devnagari is also used for writing Marathi, Sanskrit and Nepali. Moreover, Hindi is the third most popular language in the world [1]. It is spoken by more than 500 million people in the world. Devnagari has 11 vowels and 33 consonants. They are called basic characters. Vowels can be written as independent letters, or by using a variety of diacritical marks which are written above, below, before or after the consonant they belong to. When vowels are written in this way they are known as modifiers and the characters so formed are called conjuncts. Sometimes two or more consonants can combine and take new shapes. These new shaped clusters are known as compound characters. These types of basic characters, compound characters and modifiers are present not only in Devnagari but also in other scripts. All the characters have a horizontal line at the upper part, known as Shirorekha. In continuous handwriting, from left to right direction, the shirorekha of one character joins with the shirorekha of the previous or next character of the same word. In this fashion, multiple characters and modified shapes in a word appear as a single connected component joined through the common shirorekha. Also in Devnagari there are vowels, consonants, vowel modifiers and compound characters, numerals. Moreover, there are many similar shaped characters. All these variations make Devnagari Optical Character Recognition, a challenging problem. A sample of Devnagari character set is provided in table 1 to 6. Table 1: Vowels and Corresponding Modifiers Table 2: Consonants Table 3: Half Form of Consonants with Vertical Bar. Table 4: Examples of Combination of Half- Consonant and Consonant. Table 5: Examples of Special Combination of Half-Consonant and Consonant. Table 6: Special Symbols
  • 3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 48 3. IMAGE PREPROCESSING We have collected the printed pages from different office correspondence. The document pages are scanned using a flat bed scanner at a resolution of 300 dpi. These pixels may have values: OFF (0) or ON (1) for binary images, 0–255 for gray-scale images, and 3 channels of 0–255 colour values for colour images. Colour image is converted to grayscale by eliminating the hue and saturation information while retaining the luminance. It is further analyzed to get useful information. Such processing is explained below. 3.1 Thresholding and Binarization: The digitized text images are converted into binary images by thresholding using Otsu’s method [17]. Original image contains 0 for Object and 1 for background. The image inverted to obtain image such that object pixels are represented by 1 and background pixels by 0. 3.2 Noise reduction: The noise, introduced by the optical scanning device or the writing instrument, causes disconnected line segments, bumps and gaps in lines, filled loops etc. The distortion including local variations, rounding of corners, dilation and erosion is also a problem. Prior to the character recognition, it is necessary to eliminate these imperfections [11-12]. It is carried using various morphological processing techniques. 3.3 Skew Detection and Correction: Handwritten document may originally be skewed or skewness may introduce in document scanning process. This effect is unintentional in many real cases, and it should be eliminated because it dramatically reduces the accuracy of the subsequent processes such as segmentation and classification. Skewed lines are made horizontal by calculating skew angle and making proper correction in the raw image using Hu moments and various transforms [13-15]. Figure 1: Preprocessed Images (a) Original, (b) segmented (c) Shirorekha removed (d) Thinned (e) image edging 3.4 Thinning: The boundary detection of image is done to enable easier subsequent detection of pertinent features and objects of interest (see fig.1- a to e). Various standard functions are available in MATLAB for above operations [16].
  • 4. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 49 4. PROPOSED SEGMENTATION APPROACH After the image is preprocessed using methods discussed in section 3, we now apply various techniques for segmentation of document lines, words and characters. The process of segmentation mainly follows the following pattern: 1) Identify the text lines in the page. 2) Identify the words in individual line. 3) Finally identify individual character in each word. 4.1 Line Segmentation. The global horizontal projection method is used to compute sum of all white pixels on every row and construct corresponding histogram. The steps for line segmentation are as follow: • Construct the Horizontal Histogram for the image (fig. 2-b). • Count the white pixel in each row. • Using the Histogram, find the rows containing no white pixel. • Replace all such rows by 1 (fig. 2-c). • Invert the image to make empty rows as 0 and text lines will have original pixels. • Mark the Bounding Box for text lines (figure 2-e) using standard Matlab functions (regionprops and rectangle). • Copy the pixels in Bounding Box and save in separate file. (Separated lines shown in fig. 2-f). a) Original Scanned Document (b) Image Histogram (c) Blank space between the lines (d) Line separation
  • 5. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 50 (e) Regions of interest (f) segmented lines Figure 2: Line Segmentation 4.2 Word Segmentation The global horizontal projection method is used here to compute sum of all white pixels on every column and construct corresponding histogram. The steps for line segmentation are as follow: • Construct the Vertical Histogram for the image (fig. 3-b). • Count the white pixel in each column. • Using the Histogram, find the columns containing no white pixel. • Replace all such columns by 1 • Invert the image to make empty rows as 0 and text words will have original pixels. • Mark the Bounding Box for word. (See fig 3-c) • Copy the pixels in the Bounding Box and save in separate file. (See fig. 3-d). (a) Original line (b) Word Histogram (c) Regions of interest (d) Segmented words Figure 3: Word Segmentation
  • 6. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 51 4.3 Character Segmentation A slight modification in previous algorithm (section 4.2) is used here. The steps for line segmentation are as follow: • Get the thinned image using Matlab bwmorph function. (This is done to normalize image against thickness of the character). • Count the white pixel in each column. • Find the position containing single white pixel. • Replace all such columns by 1. • Invert the image to make such columns as 0 and text characters will have original pixels. • Mark the Bounding Box for characters using standard Matlab functions. See fig 4-a. • Copy the pixels in the Bounding Box and save in separate file. (Separated characters are shown in fig. 4-b). (a) Region of Interest (b) segmented characters Figure 4: Character segmentation 5. RESULTS AND DISCUSSION Various documents were collected and tested. It is observed that line segmentation is done with nearly 100% accuracy. Word segmentation is accurate as long as the document contains characters only. When Devnagari numerals are present in the document, which does not contain shirorekha, each digit is considered as separate word by the algorithm. Hence accuracy is reduced marginally. In the present case it is 91%. Table 7: Character Segmentation results for document in fig 2 (a) Words ( in figure 4) 1 2 3 4 5 6 Characters present 3 3 6 3 2 2 Characters recognized 5 5 12 7 6 6 Accuracy 60 % 60 % 50 % 42 % 33 % 33 %
  • 7. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 52 Table 8: Overall Segmentation results for document in fig 2 (a) Line Segmentation Lines in Document Recognized lines Accuracy 8 8 100 % Word Segmentation words in Document Recognized words Accuracy 41 45 91 % Line Segmentation Characters in Document Recognized Characters Accuracy 133 242 55 % In case of character segmentation, words are segmented into more symbols than actually present in the word as shown in figure 4. Result is summarized in Table 8. This error is resulted since the words are scanned only from top to bottom by the algorithm used. Devnagari is two dimensional script as consonants are modified in many ways from top, bottom, left or right to form a meaningful letter. Unconnected Vertical lines in the words are recognized as separate symbol by the algorithm used. For accurate segmentation, all the modifiers must be segmented so that their recognition can be properly done done. 6. CONCLUSIONS AND FUTURE WORK In this paper, we have presented a primary work for segmentation of lines, words and characters of Devnagari script. Nearly 100% successful segmentation achieved in line and word segmentation but character level segmentation needs more effort as it is complicated for Devnagari script. This is challenging work due to following reasons. • Compound letters are connected at various places. It is difficult to identify exact connecting points for segmentation. • Upper and lower modifier segmentation needs different approaches. • Separating anuswara (.) and full stop (.) from noise is critical as both resemble the same. Knowledge of natural language processing techniques needs to be applied here. • Handwritten unconnected compound letter segmentation is also critical. • Handwritten unintentionally connected simple letter segmentation is also critical. All these issues will be dealt in the future for printed and handwritten documents in Devnagari script by using various approaches. REFERENCES [1] Nallapareddy Priyanka, Srikanta Pal, Ranju Mandal, (2010) “Line and Word Segmentation Approach for Printed Documents”, IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition-RTIPPR, pp 30-36. [2] K. Wong, R. Casey and F. Wahl, (1982) “Document Analysis System”, IBM j. Res. Dev., 26(6), pp. 647-656. [3] G. Nagy, S. Seth, and M. Viswanathan, (1992) “A prototype document image analysis system for technical journals”, Computer, vol. 25, pp. 10-22. [4] Vijay Kumar, Pankaj K.Senegar, (2010) “Segmentation of Printed Text in Devnagari Script and Gurmukhi Script”, IJCA: International Journal of Computer Applications, Vol.3,pp. 24-29. [5] U. Pal and Sagarika Datta, (2003) “Segmentation of Bangla Unconstrained Handwritten Text”, Proc. 7th Int. Conf. on Document analysis and Recognition, pp. 1128-113.
  • 8. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.3, August 2011 53 [6] Vikas J Dongre, Vijay H Mankar, (July 2011) “Segmentation of Devnagari Documents”, Communications in Computer and Information Science, 2011, Volume 198, Part 1, Springer proceedings, 1st International conference, ACITY Chennai, India, pp 211-218. [7] Vikas J Dongre, Vijay H Mankar, (2010) “A Review of Research on Devnagari Character Recognition”, International Journal of Computer Applications (0975 – 8887) Volume 12– No.2, pp. 8-15. [8] U. Pal, M. Mitra, and B. B. Chaudhuri, (2001) “Multi-skew detection of Indian script documents”, Proc. 6th Int. Conf. Document Analysis Recognition, pp. 292-296. [9] Likforman-Sulem L, Zahour A and Taconet B, (2007) “Text line Segmentation of Historical Documents: a Survey”, International Journal on Document Analysis and Recognition, Springer, Vol. 9, Issue 2, pp.123-138. [10] G. Magy (2000) “Twenty years of Document Analysis in PAMI”, IEEE Trans. in PAMI, Vol.22, pp. 38-61. [11] J. Serra, (1994) “Morphological Filtering: An Overview”, Signal Processing, vol. 38, no.1, pp.3-11. [12] Nafiz Arica, Fatos T. Yarman Vural, (2000) “An Overview of Character Recognition Focused On Off-line Handwriting”, IEEE C99-06-C-203. [13] Mohamed Cheriet, Nawwaf Kharma, Cheng-Lin Liu, Ching Y. Suen, (2007) “Character Recognition Systems: A Guide for students and Practitioners”, John Wiley & Sons, Inc., Hoboken, New Jersey. [14] Rajiv Kapoor, Deepak Bagai, T. S. Kamal, (2002) “Skew angle detection of a cursive handwritten Devnagari script character image”, Journal of Indian Inst. Science, pp. 161–175. [15] U. Pal, M. Mitra and B. B. Chaudhuri,(2001) “Multi-Skew Detection of Indian Script Documents”, CVPRU IEEE, pp 292-296. [16] V. H. Mankar et al, (2010) “Contour Detection and Recovery through Bio-Medical Watermarking for Telediagnosis”, International Journal of Tomography & Statistics, Vol. 14 (Special Volume), Number S10. [17] Guo Jing; Rajan D.; Chng Eng Siong, (2005) “Motion Detection with Adaptive Background and Dynamic Thresholds”, Fifth International Conference on Information, Communications and Signal Processing, Bangkok, W B.4, pp 41-45. Authors Vikas J Dongre received B.E and M.E. in Electronics in 19991 and 1994 respectively. He served as lecturer in SSVPS engineering college Dhule, (M.S.) India from 1992 to 1994. He Joined Government Polytechnic Nagpur as Lecturer in 1994 where he is presently working as lecturer (selection grade). His areas of interests include Microcontrollers, embedded systems, image recognition, and innovative Laboratory practices. He is pursuing for PhD in Offline Handwritten Devnagari Character Recognition. He has published one research paper in international journal and two research paper in international conferences. Vijay H. Mankar received M. Tech. degree in Electronics Engineering from VNIT, Nagpur University, India in 1995 and Ph.D. (Engg) from Jadavpur University, Kolkata, India in 2009 respectively. He has more than 17 years of teaching experience and presently working as a Lecturer (Selection Grade) in Government Polytechnic, Nagpur (MS), India. He has published more than 30 research papers in international conference and journals. His field of interest includes digital image processing, data hiding and watermarking.