SlideShare a Scribd company logo
A Multi-tier Holistic approach for Urdu Nastaliq Recognition
Syed. Afaq Husain* and Syed. Hassan Amin**
Faculty of Computer Science and Engineering
Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology
Topi, 23460, Dist. Swabi, NWFP, PAKISTAN
Email:* syed_a_h@giki.edu.pk_ , **shassan@giki.edu.pk
Abstract
Character recognition is an active area of research
with numerous applications including web publishing,
document analysis and text to speech conversion. In this
paper, we present a new approach for the off-line
recognition of cursive Urdu Text. This methodology has
been developed for the Noori Nastaliq Script [Ahmed 1].
Word (Ligature) based identification has been adopted
instead of character based identification. A multi-tier
holistic approach has been utilized to recognize ligatures
from a pre-defined ligature set. Initially, the special
ligatures (Dots, Tay, Hamza & Mad) are identified from
the base ligatures. These special ligatures are associated to
the most probable neighboring base ligature in the second
step. Finally, the above information along with some other
RTS invariant features of base ligature is presented to the
Feed Forward Back Propagation neural network to
perform the final recognition task.
Keywords: OCR, Urdu Character Recognition, Noori
Nastaliq, Ligature based identification, Back-propagation
Neural Network.
1. Objective
Urdu is the national language of Pakistan. It is a
language that is understood by over 300 million people
belonging to Pakistan, India and Bangladesh. Due to its
historical database of literature, there is a need to devise
automatic systems for conversion of this literature into
electronic form that may be accessible on the world-wide-
web. The suggested Urdu Text recognition system
endeavors to convert scanned Urdu documents
automatically into computerized text files in UZT format.
The Diacritics (Aerab) and punctuation have been
ignored in the current version of the system, however may
be classified as another category of symbols. Multi-Font
and multi-lingual support has also been ignored for
simplification.
2. Introduction
Urdu character set is based on the Arabic
character set. It is a cursive language even in its printed
form. In the past, a lot of research has been done on
automatic recognition of text written in languages based
on Roman [Guyon],[Ha], Chinese text [Guo],[Ding],
Arabic [Amin1] and Persian [Khorsheed3] but no serious
research has ever been published on Urdu text recognition.
Arabic and Persian, which are based on similar basic
characters and writing styles as Urdu, have seen quite
worthwhile research in the past decade. However, those
solutions are not valid to Urdu due to a number of inherent
differences in the script and styles of Urdu text. Nasakh
and Nastaliq are the two most popular writing styles
(scripts) in Urdu and both have their own unique features
that make them different and more complicated than their
close counterparts. The following chart (Table 1)
represents a view of the comparative complexities of Urdu
Script as compared to some other languages.
Like Arabic, recognizing Urdu script presents
challenges of cursive orthography and context sensitive
letter shape [Khorsheed2]. However, in contrast to Arabic
text, in which connected characters follows a base line, the
joined characters in Nastaliq and Nasakh are positioned
according to their preceding, pro-ceding as well as a
vertical justification of the ligature.
Table 1: Comparative features of some languages
The word recognition strategies are generally
classified into three categories, namely Holistic Approach,
Analytic Approach and Feature Sequence Matching.
[Shridher]. However, some researchers regard the
Sequence matching techniques to be a form of Holistic
approach. The analytic approach tries to segment the word
into characters before the recognition task while the
holistic approaches tries to recognize the word or its sub-
part (ligature) as a whole. [Khorsheed1]. The first
approach segment Urdu words into characters, and second
approach segment words into symbols. These symbols
may be character, ligature or possibly a fraction of
character.
In this paper, we present an approach to
recognize commonly used ligatures from Noori Nastaliq
Script developed by Ahmad Mirza Jamil [Ahmed1].
Nastaliq is one of the most beautiful and one of the most
complex scripts. The script was originally created by the
Characteristics Urdu Arabic Latin Hebrew Hindi
H Justification R L R L L R R L L R
V-Justification Centre Base No No Top
Cursive Yes Yes No No Yes
Diacritics Yes Yes No No Yes
# Vowels 2 2 5 11 -
# Letters 37 28 26 22 40
Letter Shapes 1-28 1-4 2 1 1
Complementary
Characters
5 3- - - -
calligrapher Mir Ali Tabrezi. The attempts to mechanize
Urdu script didn’t bear any success for a long time, and as
a result a typewriter that could type in the Nastaliq style, is
not available even today. There are two approaches to
computerizing Nastaliq i.e. Ligature based approach (more
glyphs) and character based approach (more rules). For
example, the word has three ligatures or separate
shapes , and . Noori Nastaliq describes about
20000 ligatures that are required to write almost all words
contained in the Urdu dictionary. Since, the ligature based
recognition is dependent on the ligatures used for training
it has the context information due to which it has a higher
performance. However, it has the disadvantage that adding
new ligatures into the system would require re-training of
the system. E.g. the. Urdu word Computer is one ligature
that is not in the formal dictionary of ligatures though it is
widely written in Urdu text.
3. Character Recognition Schemes
The problem of Urdu text recognition is closely
related to Arabic text recognition. Arabic Text
Recognition Systems generally have following stages:
image acquisition, preprocessing, segmentation, feature
extraction, classification and recognition [Khorsheed3].
The Arabic Text Recognition Systems are further
divided into Segmentation based and Segmentation-free
systems. Here we briefly describe approaches into Arabic
Text Recognition, with the view that these give valuable
insight into problem of Urdu Text Recognition [Bunke].
3.1 Segmentation Free Systems
In these systems, the word is recognized as a
whole without trying to segment and recognize characters
or primitives [7]. One approach for such systems is to
calculate a single feature vector for each word; this feature
vector is then used to recognize the word.
3.2 Segmentation Based Systems
In Segmentation based systems, each word is
further divided into a number of subparts. The
segmentation-based systems are further subdivided into
four categories: Isolated/Pre-segmented characters,
segmenting a word into characters, segmenting a word into
primitives, Integration of recognition and segmentation.
These systems are either impractical because they try to
recognize digits and isolated characters or they have low
recognition rate because of segmentation errors
[Khorsheed2].
4. Ligature Identification System
In our proposed system, after preprocessing, the
text is segmented into a number of ligatures ordered from
right to left and top to bottom. The ligatures at this stage
are defined as every connected set of characters. These
ligatures also contain the special symbols used in Urdu
namely, (Tau, Mad, Dots, Hamza and Ha). A number of
features are calculated and then fed into Feed Forward
Back propagation neural net to recognize special ligatures
from the base ligatures. These special ligatures are then
associated with the base ligature, forming part of the
feature vector used to recognize base ligature, thus aiding
in the recognition of the base ligature. This feature vector
is then used to recognize ligatures using a Feed Forward
Back Propagation neural net.
Figure 1: Stages of Urdu Character Recognition
4.1 Preprocessing
The preprocessing stage involves Smoothing,
Skew detection and correction, Document decomposition,
Slant normalization etc.
4.2 Segmentation
In document image analysis, four commonly used
segmentation algorithms are connected component
labeling, X-Y tree decomposition, run-length smearing,
and Hough Transform.
We have applied Connected Component Labeling
to the image of Urdu text. This technique assigns to each
connected component of binary image a distinct label. The
labels are usually natural numbers from 1 to the number of
connected components in the input image. The algorithm
scans the image from left-to-right and top-to-bottom. On
the first line containing black pixels, a unique label is
assigned to each contiguous run of black pixels. For each
black pixel, the pixels in its eight neighborhood are
examined, if any of these pixels has been labeled the same
label is assigned to the current pixel, otherwise a new label
is assigned to it. The procedure continues to the bottom of
the image [Khorsheed3].
4.3 Feature Extraction I
In this stage, we extract only those features that
will help us in the recognition of special ligatures, see
figure. These features are Solidity, Number of Holes, Axis
Ratio, Eccentricity, Moments, Normalized segment length,
curvature, ratio of bounding box width and height.
Preprocessing
Segmentation
Feature Extraction I
Special Ligature Identification
Feature Extraction II
Ligature Identification
4.3.1 Solidity
Solidity is a scalar quantity. It is defined as the
proportion of the pixels in the convex hull that are also in
the region. It is computed as
Solidity = Ligature Area/ Convex Hull Area
Where,
Ligature Area = ∑∑f (x, y)
For all x, y in the binary image of the ligature
Convex Hull Area = ∑∑f(x,y)
For all x, y in the convex hull of the ligature
4.3.2 Axes Ratio
It is the ratio of the major axis to the minor axis
of the best-fit ellipse of the ligature.
Axis Ratio = a/b
Where a and b are the lengths of semi-major axis and
semi-minor axis of the best-fit ellipse.
4.3.3 Eccentricity
It is the ratio of the distance between the foci of the
best-fit ellipse to its major axis.
Eccentricity = distance btw foci / 2b
4.3.4 Moment based features
These refer to certain functions of moments,
which are invariant to geometric transformations such as,
translation, scaling, and rotation [6]. Such features are
useful in identification of objects with unique shapes,
regardless of their location, size and orientation
4.3.5 Normalized Length Feature
First the normalized length of a segment i is
calculated relative to other segment lengths in the same
word. Then normalized length of the ligature is calculated
as
Normalized Length = ∑ L(i)
4.3.6 Curvature Feature:
In a similar fashion, first the curvature of a segment is
measured by simply dividing the Euclidean distance
between the two feature points of that segment by its
actual length. This feature equals zero when the segment is
a loop and 1 when the segment is a straight line.
C(i) = (Euclidean distance between
endpoints) / segment length
Then curvature feature of the ligature is calculated as a
sum of curvature features of all of its segments.
Curvature Feature = ∑ C(i)
4.3.7 Number of Holes:
This feature gives total number of holes in a ligature.
If feature points of ligature are considered as a set of
vertices V, and segments as a set of edges E, of a graph G
(V, E), then total number of holes in the ligature can be
found using graph theory as following:
Number of Holes = E - Est
Here,
E = Number of edges in G
Est= Number of edges in the spanning tree of G.
A graph with N vertices has N-1 edges in its spanning
tree.
4.4 Special Ligature Identification
For identifying special ligatures, a Feed Forward
Back propagation neural network with 15 inputs, 25
hidden and 25 output neurons was used. The feature
vectors obtained from Feature extraction 1 stage of the
system are fed to this neural network. It then identifies the
ligatures as either special ligatures or base ligatures.
Figure 2: Some special ligatures
4.5 Feature Extraction II
In this stage, we associate special ligatures with
the base ligatures. We associate special ligature with the
base ligature whose Centroid-to-Centroid distance is
minimum. A number of lines are grown from the centre of
each special ligature, when one of these lines touches a
base ligature, then the special ligature is associated with
that base ligature.
In this stage, due to association of special
ligatures with the base ligatures twenty new features are
added to the feature vector of the base ligature.
4.6 Ligature Identification
In this stage, the final feature vector consisting of
34 features is fed into Feed Forward Back propagation
neural network. The network architecture consists of 34
inputs, 65 hidden neurons and 45 output neurons.
5. Results
The system was trained using a training set of
two hundred carefully selected ligatures. The testing was
done on bitmap images containing Urdu written in
Nastaliq font using a text editor.
This simplified the problem by neglecting the
pre-processing stage required for noise removal during
image acquisition. The training set contained the more
simplified and commonly used ligatures.
The performance of the system on images
containing the trained ligatures only was 100 %.. However
incases, where it contained additional ligatures, they were
classified to the closest match in the training set. No
rejection class was utilized.
6. Conclusion
In this paper, we have presented a method for
recognition of Cursive Urdu text written in Nastaliq Script.
The system is currently trained for a small number of
ligatures but has the potential to be expanded to be more
practical use. Our approach minimizes the errors due to
segmentation by using segmentation free approach. By
using multiple classes of features , we have improved the
number of ligatures that can be identified.
7. Future Directions
A number of possible directions are under
consideration for enhancement of the system for practical
use namely,
1. Enhancement of the number of ligatures used for
training.
2. Addition of Special characters, Numerals and Aerab
for recognition as special ligatures
3. Recognition of intonation marks in the document.
4. Addition of multi lingual support in the system.
References
1. [Ahmed] Ahmad Mirza Jamil, “Noori Nastaliq,
Computerized Urdu Calligraphy”, Elite Publishers,
1982.
2. [Amin] A.Amin and S.Al-Fedaghi, “Machine
recognition of printed Arabic text utilizing a natural
language morphology”, Int. J. of Man-machine
Studies 35,6 (1991), 768-788.
3. [Badr] Badr Al-Badr, Robert M. Haralick,
“Segmentation–Free word recognition with
application to Arabic”, IJDAR1(3):147-166(1998)
4. [Bunke] H. Bunke, P. Wang, “Handbook of character
recognition and document image analysis”, World
Scientific, 2000.
5. [Ding] X.Q.Ding, Y.S.Wu, Recognition of multi-font
printed chineses characters, CCIPP/CLCS, 1988,
Toroto, Canada.
6. [Guo] H.Guo, X.Q.Ding, The development of high
performance Chineses/English bi-lingual OCR
system, proc. CMIN ’95, Beijing, China, March 95,
248-253.
7. [Guyon] I.Guyon, J.Bromley, N.Matic, etc, “A neural
network system for recognizing on-line handwriting”,
Models of Neural network, Springer Verlag, 1996.
8. [Ha] J.Y.Ha, S,C. Oh, J.H. Kim, and Y.B. Kwon,
“Unconstrained handwriiten word recognition with
interconnected hidden Markov Models, 3rd Int.
Workshop on Frontiers in Handwriting Recognition”,
Buffalo, May 93, 455-460
9. [Khorsheed1] Mohammad S. Khorsheed, William F.
Clocksin, “Structural features of cursive Arabic
script”, proc of 10th
British Vision Conference,
University of Nottingham, UK, September-1999.
10. [Khorsheed2] M S Khorsheed, ”Off-Line Arabic
Character Recognition A Review”.
11. [Khorsheed3] Mohammad S. Khorsheed, ”Automatic
recognition of words in Arabic manuscripts”, PhD
Dissertation, Churchill College, University of
Cambridge, June 2000
12. [Shridher] N.Shridher, F.Kimura, “Segmentation
based cursive handwriting recognition”, Handbook of
character recognition and document image analysis,
126-127, World scientific, 1997.
13. [Trier] Ovinid Due Trier, Anil K. Jain, and Torfinn
Taxt, “Feature Extraction Methods for Character
Recognition – A Survey”, Pattern Recognition, Vol.
29 , No. 4 , pp. 641-662 , 1996.
Multitier holistic Approach for urdu Nastaliq Recognition

More Related Content

What's hot

IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET Journal
 
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
csandit
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character Recognition
IOSR Journals
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
iosrjce
 
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
ijeei-iaes
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
ijnlc
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting Recognition
Bindu Karki
 
Devanagari Character Recognition
Devanagari Character RecognitionDevanagari Character Recognition
Devanagari Character Recognition
Pulkit Goyal
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
iosrjce
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
IJERA Editor
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
Badruz Nasrin Basri
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition in
ijaia
 
Rule based algorithm for handwritten characters recognition
Rule based algorithm for handwritten characters recognitionRule based algorithm for handwritten characters recognition
Rule based algorithm for handwritten characters recognition
Randa Elanwar
 
A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...
A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...
A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...ijma
 
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...
ijistjournal
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
iosrjce
 
Co4201605611
Co4201605611Co4201605611
Co4201605611
IJERA Editor
 
character recognition: Scope and challenges
 character recognition: Scope and challenges character recognition: Scope and challenges
character recognition: Scope and challenges
Vikas Dongre
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
IOSR Journals
 
Improvement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching charactersImprovement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching characters
eSAT Publishing House
 

What's hot (20)

IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
 
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character Recognition
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting Recognition
 
Devanagari Character Recognition
Devanagari Character RecognitionDevanagari Character Recognition
Devanagari Character Recognition
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition in
 
Rule based algorithm for handwritten characters recognition
Rule based algorithm for handwritten characters recognitionRule based algorithm for handwritten characters recognition
Rule based algorithm for handwritten characters recognition
 
A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...
A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...
A MODEL TO CONVERT WAVE–FORM-TEXT TO LINEAR-FORM-TEXT FOR BETTER READABILITY ...
 
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDE...
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
Co4201605611
Co4201605611Co4201605611
Co4201605611
 
character recognition: Scope and challenges
 character recognition: Scope and challenges character recognition: Scope and challenges
character recognition: Scope and challenges
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
 
Improvement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching charactersImprovement of telugu ocr by segmentation of touching characters
Improvement of telugu ocr by segmentation of touching characters
 

Similar to Multitier holistic Approach for urdu Nastaliq Recognition

Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
ijnlc
 
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
CSCJournals
 
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionPreprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Editor IJCATR
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil
 
Isolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code FeaturesIsolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code Features
International Journal of Science and Research (IJSR)
 
E123440
E123440E123440
E123440
IJRES Journal
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...butest
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
CSCJournals
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
IJCSEA Journal
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
Dhabal Sethi
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognition
ijaia
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
IJDKP
 
Off line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic characterOff line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic character
csandit
 
International Journal of Image Processing (IJIP) Volume (3) Issue (3)
International Journal of Image Processing (IJIP) Volume (3) Issue (3)International Journal of Image Processing (IJIP) Volume (3) Issue (3)
International Journal of Image Processing (IJIP) Volume (3) Issue (3)CSCJournals
 

Similar to Multitier holistic Approach for urdu Nastaliq Recognition (20)

Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
 
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
 
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionPreprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
 
P-6
P-6P-6
P-6
 
P-6
P-6P-6
P-6
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
Bh24380384
Bh24380384Bh24380384
Bh24380384
 
Isolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code FeaturesIsolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code Features
 
E123440
E123440E123440
E123440
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognition
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
 
Off line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic characterOff line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic character
 
International Journal of Image Processing (IJIP) Volume (3) Issue (3)
International Journal of Image Processing (IJIP) Volume (3) Issue (3)International Journal of Image Processing (IJIP) Volume (3) Issue (3)
International Journal of Image Processing (IJIP) Volume (3) Issue (3)
 

More from Dr. Syed Hassan Amin

Greenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparisonGreenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparison
Dr. Syed Hassan Amin
 
Introduction To Docker
Introduction To  DockerIntroduction To  Docker
Introduction To Docker
Dr. Syed Hassan Amin
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
Dr. Syed Hassan Amin
 
Agile Scrum Methodology
Agile Scrum MethodologyAgile Scrum Methodology
Agile Scrum Methodology
Dr. Syed Hassan Amin
 
Thin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better CodeThin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better Code
Dr. Syed Hassan Amin
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
Dr. Syed Hassan Amin
 
Software Project Management Tips and Tricks
Software Project Management Tips and TricksSoftware Project Management Tips and Tricks
Software Project Management Tips and Tricks
Dr. Syed Hassan Amin
 
Improving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design PrinciplesImproving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design Principles
Dr. Syed Hassan Amin
 
Learning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve JobsLearning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve Jobs
Dr. Syed Hassan Amin
 
Understanding and Managing Technical Debt
Understanding and Managing Technical DebtUnderstanding and Managing Technical Debt
Understanding and Managing Technical Debt
Dr. Syed Hassan Amin
 

More from Dr. Syed Hassan Amin (11)

Greenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparisonGreenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparison
 
Introduction To Docker
Introduction To  DockerIntroduction To  Docker
Introduction To Docker
 
Laravel Unit Testing
Laravel Unit TestingLaravel Unit Testing
Laravel Unit Testing
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Agile Scrum Methodology
Agile Scrum MethodologyAgile Scrum Methodology
Agile Scrum Methodology
 
Thin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better CodeThin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better Code
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
 
Software Project Management Tips and Tricks
Software Project Management Tips and TricksSoftware Project Management Tips and Tricks
Software Project Management Tips and Tricks
 
Improving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design PrinciplesImproving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design Principles
 
Learning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve JobsLearning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve Jobs
 
Understanding and Managing Technical Debt
Understanding and Managing Technical DebtUnderstanding and Managing Technical Debt
Understanding and Managing Technical Debt
 

Recently uploaded

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 

Recently uploaded (20)

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 

Multitier holistic Approach for urdu Nastaliq Recognition

  • 1. A Multi-tier Holistic approach for Urdu Nastaliq Recognition Syed. Afaq Husain* and Syed. Hassan Amin** Faculty of Computer Science and Engineering Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology Topi, 23460, Dist. Swabi, NWFP, PAKISTAN Email:* syed_a_h@giki.edu.pk_ , **shassan@giki.edu.pk Abstract Character recognition is an active area of research with numerous applications including web publishing, document analysis and text to speech conversion. In this paper, we present a new approach for the off-line recognition of cursive Urdu Text. This methodology has been developed for the Noori Nastaliq Script [Ahmed 1]. Word (Ligature) based identification has been adopted instead of character based identification. A multi-tier holistic approach has been utilized to recognize ligatures from a pre-defined ligature set. Initially, the special ligatures (Dots, Tay, Hamza & Mad) are identified from the base ligatures. These special ligatures are associated to the most probable neighboring base ligature in the second step. Finally, the above information along with some other RTS invariant features of base ligature is presented to the Feed Forward Back Propagation neural network to perform the final recognition task. Keywords: OCR, Urdu Character Recognition, Noori Nastaliq, Ligature based identification, Back-propagation Neural Network. 1. Objective Urdu is the national language of Pakistan. It is a language that is understood by over 300 million people belonging to Pakistan, India and Bangladesh. Due to its historical database of literature, there is a need to devise automatic systems for conversion of this literature into electronic form that may be accessible on the world-wide- web. The suggested Urdu Text recognition system endeavors to convert scanned Urdu documents automatically into computerized text files in UZT format. The Diacritics (Aerab) and punctuation have been ignored in the current version of the system, however may be classified as another category of symbols. Multi-Font and multi-lingual support has also been ignored for simplification. 2. Introduction Urdu character set is based on the Arabic character set. It is a cursive language even in its printed form. In the past, a lot of research has been done on automatic recognition of text written in languages based on Roman [Guyon],[Ha], Chinese text [Guo],[Ding], Arabic [Amin1] and Persian [Khorsheed3] but no serious research has ever been published on Urdu text recognition. Arabic and Persian, which are based on similar basic characters and writing styles as Urdu, have seen quite worthwhile research in the past decade. However, those solutions are not valid to Urdu due to a number of inherent differences in the script and styles of Urdu text. Nasakh and Nastaliq are the two most popular writing styles (scripts) in Urdu and both have their own unique features that make them different and more complicated than their close counterparts. The following chart (Table 1) represents a view of the comparative complexities of Urdu Script as compared to some other languages. Like Arabic, recognizing Urdu script presents challenges of cursive orthography and context sensitive letter shape [Khorsheed2]. However, in contrast to Arabic text, in which connected characters follows a base line, the joined characters in Nastaliq and Nasakh are positioned according to their preceding, pro-ceding as well as a vertical justification of the ligature. Table 1: Comparative features of some languages The word recognition strategies are generally classified into three categories, namely Holistic Approach, Analytic Approach and Feature Sequence Matching. [Shridher]. However, some researchers regard the Sequence matching techniques to be a form of Holistic approach. The analytic approach tries to segment the word into characters before the recognition task while the holistic approaches tries to recognize the word or its sub- part (ligature) as a whole. [Khorsheed1]. The first approach segment Urdu words into characters, and second approach segment words into symbols. These symbols may be character, ligature or possibly a fraction of character. In this paper, we present an approach to recognize commonly used ligatures from Noori Nastaliq Script developed by Ahmad Mirza Jamil [Ahmed1]. Nastaliq is one of the most beautiful and one of the most complex scripts. The script was originally created by the Characteristics Urdu Arabic Latin Hebrew Hindi H Justification R L R L L R R L L R V-Justification Centre Base No No Top Cursive Yes Yes No No Yes Diacritics Yes Yes No No Yes # Vowels 2 2 5 11 - # Letters 37 28 26 22 40 Letter Shapes 1-28 1-4 2 1 1 Complementary Characters 5 3- - - -
  • 2. calligrapher Mir Ali Tabrezi. The attempts to mechanize Urdu script didn’t bear any success for a long time, and as a result a typewriter that could type in the Nastaliq style, is not available even today. There are two approaches to computerizing Nastaliq i.e. Ligature based approach (more glyphs) and character based approach (more rules). For example, the word has three ligatures or separate shapes , and . Noori Nastaliq describes about 20000 ligatures that are required to write almost all words contained in the Urdu dictionary. Since, the ligature based recognition is dependent on the ligatures used for training it has the context information due to which it has a higher performance. However, it has the disadvantage that adding new ligatures into the system would require re-training of the system. E.g. the. Urdu word Computer is one ligature that is not in the formal dictionary of ligatures though it is widely written in Urdu text. 3. Character Recognition Schemes The problem of Urdu text recognition is closely related to Arabic text recognition. Arabic Text Recognition Systems generally have following stages: image acquisition, preprocessing, segmentation, feature extraction, classification and recognition [Khorsheed3]. The Arabic Text Recognition Systems are further divided into Segmentation based and Segmentation-free systems. Here we briefly describe approaches into Arabic Text Recognition, with the view that these give valuable insight into problem of Urdu Text Recognition [Bunke]. 3.1 Segmentation Free Systems In these systems, the word is recognized as a whole without trying to segment and recognize characters or primitives [7]. One approach for such systems is to calculate a single feature vector for each word; this feature vector is then used to recognize the word. 3.2 Segmentation Based Systems In Segmentation based systems, each word is further divided into a number of subparts. The segmentation-based systems are further subdivided into four categories: Isolated/Pre-segmented characters, segmenting a word into characters, segmenting a word into primitives, Integration of recognition and segmentation. These systems are either impractical because they try to recognize digits and isolated characters or they have low recognition rate because of segmentation errors [Khorsheed2]. 4. Ligature Identification System In our proposed system, after preprocessing, the text is segmented into a number of ligatures ordered from right to left and top to bottom. The ligatures at this stage are defined as every connected set of characters. These ligatures also contain the special symbols used in Urdu namely, (Tau, Mad, Dots, Hamza and Ha). A number of features are calculated and then fed into Feed Forward Back propagation neural net to recognize special ligatures from the base ligatures. These special ligatures are then associated with the base ligature, forming part of the feature vector used to recognize base ligature, thus aiding in the recognition of the base ligature. This feature vector is then used to recognize ligatures using a Feed Forward Back Propagation neural net. Figure 1: Stages of Urdu Character Recognition 4.1 Preprocessing The preprocessing stage involves Smoothing, Skew detection and correction, Document decomposition, Slant normalization etc. 4.2 Segmentation In document image analysis, four commonly used segmentation algorithms are connected component labeling, X-Y tree decomposition, run-length smearing, and Hough Transform. We have applied Connected Component Labeling to the image of Urdu text. This technique assigns to each connected component of binary image a distinct label. The labels are usually natural numbers from 1 to the number of connected components in the input image. The algorithm scans the image from left-to-right and top-to-bottom. On the first line containing black pixels, a unique label is assigned to each contiguous run of black pixels. For each black pixel, the pixels in its eight neighborhood are examined, if any of these pixels has been labeled the same label is assigned to the current pixel, otherwise a new label is assigned to it. The procedure continues to the bottom of the image [Khorsheed3]. 4.3 Feature Extraction I In this stage, we extract only those features that will help us in the recognition of special ligatures, see figure. These features are Solidity, Number of Holes, Axis Ratio, Eccentricity, Moments, Normalized segment length, curvature, ratio of bounding box width and height. Preprocessing Segmentation Feature Extraction I Special Ligature Identification Feature Extraction II Ligature Identification
  • 3. 4.3.1 Solidity Solidity is a scalar quantity. It is defined as the proportion of the pixels in the convex hull that are also in the region. It is computed as Solidity = Ligature Area/ Convex Hull Area Where, Ligature Area = ∑∑f (x, y) For all x, y in the binary image of the ligature Convex Hull Area = ∑∑f(x,y) For all x, y in the convex hull of the ligature 4.3.2 Axes Ratio It is the ratio of the major axis to the minor axis of the best-fit ellipse of the ligature. Axis Ratio = a/b Where a and b are the lengths of semi-major axis and semi-minor axis of the best-fit ellipse. 4.3.3 Eccentricity It is the ratio of the distance between the foci of the best-fit ellipse to its major axis. Eccentricity = distance btw foci / 2b 4.3.4 Moment based features These refer to certain functions of moments, which are invariant to geometric transformations such as, translation, scaling, and rotation [6]. Such features are useful in identification of objects with unique shapes, regardless of their location, size and orientation 4.3.5 Normalized Length Feature First the normalized length of a segment i is calculated relative to other segment lengths in the same word. Then normalized length of the ligature is calculated as Normalized Length = ∑ L(i) 4.3.6 Curvature Feature: In a similar fashion, first the curvature of a segment is measured by simply dividing the Euclidean distance between the two feature points of that segment by its actual length. This feature equals zero when the segment is a loop and 1 when the segment is a straight line. C(i) = (Euclidean distance between endpoints) / segment length Then curvature feature of the ligature is calculated as a sum of curvature features of all of its segments. Curvature Feature = ∑ C(i) 4.3.7 Number of Holes: This feature gives total number of holes in a ligature. If feature points of ligature are considered as a set of vertices V, and segments as a set of edges E, of a graph G (V, E), then total number of holes in the ligature can be found using graph theory as following: Number of Holes = E - Est Here, E = Number of edges in G Est= Number of edges in the spanning tree of G. A graph with N vertices has N-1 edges in its spanning tree. 4.4 Special Ligature Identification For identifying special ligatures, a Feed Forward Back propagation neural network with 15 inputs, 25 hidden and 25 output neurons was used. The feature vectors obtained from Feature extraction 1 stage of the system are fed to this neural network. It then identifies the ligatures as either special ligatures or base ligatures. Figure 2: Some special ligatures 4.5 Feature Extraction II In this stage, we associate special ligatures with the base ligatures. We associate special ligature with the base ligature whose Centroid-to-Centroid distance is minimum. A number of lines are grown from the centre of each special ligature, when one of these lines touches a base ligature, then the special ligature is associated with that base ligature. In this stage, due to association of special ligatures with the base ligatures twenty new features are added to the feature vector of the base ligature. 4.6 Ligature Identification In this stage, the final feature vector consisting of 34 features is fed into Feed Forward Back propagation neural network. The network architecture consists of 34 inputs, 65 hidden neurons and 45 output neurons. 5. Results The system was trained using a training set of two hundred carefully selected ligatures. The testing was done on bitmap images containing Urdu written in Nastaliq font using a text editor. This simplified the problem by neglecting the pre-processing stage required for noise removal during image acquisition. The training set contained the more simplified and commonly used ligatures. The performance of the system on images containing the trained ligatures only was 100 %.. However incases, where it contained additional ligatures, they were classified to the closest match in the training set. No rejection class was utilized. 6. Conclusion In this paper, we have presented a method for recognition of Cursive Urdu text written in Nastaliq Script. The system is currently trained for a small number of ligatures but has the potential to be expanded to be more practical use. Our approach minimizes the errors due to segmentation by using segmentation free approach. By using multiple classes of features , we have improved the number of ligatures that can be identified.
  • 4. 7. Future Directions A number of possible directions are under consideration for enhancement of the system for practical use namely, 1. Enhancement of the number of ligatures used for training. 2. Addition of Special characters, Numerals and Aerab for recognition as special ligatures 3. Recognition of intonation marks in the document. 4. Addition of multi lingual support in the system. References 1. [Ahmed] Ahmad Mirza Jamil, “Noori Nastaliq, Computerized Urdu Calligraphy”, Elite Publishers, 1982. 2. [Amin] A.Amin and S.Al-Fedaghi, “Machine recognition of printed Arabic text utilizing a natural language morphology”, Int. J. of Man-machine Studies 35,6 (1991), 768-788. 3. [Badr] Badr Al-Badr, Robert M. Haralick, “Segmentation–Free word recognition with application to Arabic”, IJDAR1(3):147-166(1998) 4. [Bunke] H. Bunke, P. Wang, “Handbook of character recognition and document image analysis”, World Scientific, 2000. 5. [Ding] X.Q.Ding, Y.S.Wu, Recognition of multi-font printed chineses characters, CCIPP/CLCS, 1988, Toroto, Canada. 6. [Guo] H.Guo, X.Q.Ding, The development of high performance Chineses/English bi-lingual OCR system, proc. CMIN ’95, Beijing, China, March 95, 248-253. 7. [Guyon] I.Guyon, J.Bromley, N.Matic, etc, “A neural network system for recognizing on-line handwriting”, Models of Neural network, Springer Verlag, 1996. 8. [Ha] J.Y.Ha, S,C. Oh, J.H. Kim, and Y.B. Kwon, “Unconstrained handwriiten word recognition with interconnected hidden Markov Models, 3rd Int. Workshop on Frontiers in Handwriting Recognition”, Buffalo, May 93, 455-460 9. [Khorsheed1] Mohammad S. Khorsheed, William F. Clocksin, “Structural features of cursive Arabic script”, proc of 10th British Vision Conference, University of Nottingham, UK, September-1999. 10. [Khorsheed2] M S Khorsheed, ”Off-Line Arabic Character Recognition A Review”. 11. [Khorsheed3] Mohammad S. Khorsheed, ”Automatic recognition of words in Arabic manuscripts”, PhD Dissertation, Churchill College, University of Cambridge, June 2000 12. [Shridher] N.Shridher, F.Kimura, “Segmentation based cursive handwriting recognition”, Handbook of character recognition and document image analysis, 126-127, World scientific, 1997. 13. [Trier] Ovinid Due Trier, Anil K. Jain, and Torfinn Taxt, “Feature Extraction Methods for Character Recognition – A Survey”, Pattern Recognition, Vol. 29 , No. 4 , pp. 641-662 , 1996.