Arabic Handwritten Script Recognition Towards Generalization: A Survey

1
Arabic Handwritten ScriptArabic Handwritten Script
Recognition TowardsRecognition Towards
Generalization: A SurveyGeneralization: A SurveyAuthors:Authors:
 Randa I. M. ElanwarRanda I. M. Elanwar
Assistant Researcher, Electronic Research Institute
 Prof. Dr. Mohsen A. A. RashwanProf. Dr. Mohsen A. A. Rashwan
Professor of Digital Signal Processing, Electronic and communication dept, Cairo University
 Prof. Dr. Samia A. A. MashaliProf. Dr. Samia A. A. Mashali
Head of computers and systems dept, Electronic Research Institute

2
Presentation ContentsPresentation Contents
Introduction
Paper Objective
Arabic handwriting recognition problem
Main Challenges
Recent off-line Arabic handwriting recognition systems
Recent on-line Arabic handwriting recognition systems
Summary and Conclusion

3
IntroductionIntroduction
Handwriting recognition can be defined as the task
of transforming text represented in the spatial form
of graphical marks into its symbolic representation
The main components of a recognizer are:
1. Capturing Data & acquisition
2. Preprocessing & segmentation
3. Defining patterns and model selection
4. Feature Extraction
5. Training
6. Classification

4
• First the input device captures an image and convert
it to a usable format
• Data is then preprocessed to eliminate noise for
simplification without loosing relevant information
and may also be segmented to smaller data units

5
• The information of each data unit is sent to feature
extractor to reduce them by measuring certain
“features” or “properties”
• Patterns (or classes) should be defined and models
should be selected. These models are trained using
the extracted features.

6
• The model for a pattern may be a single specific set
of features
• To recognize (or classify) a novel pattern means to
recover the model that generated the pattern based
on the extracted features

7
The feature extractor has reduced the data unit to a
point or feature vector X in a 2D feature space (or
observation space)
Classification rule: Classify the input as Class I if its
feature vector falls below the decision boundary shown,
and as Class II otherwise.

8
The problem is that designing a very complex
recognizer is unlikely to give good generalization since it
seems to be “tuned” to the particular training samples
The question is how to optimize this tradeoff:
generalization versus simple classifier

9
Usually there is an action taken based on the
classification decision. Each action should be assigned a
certain cost.
We design our decision boundary (classification rule)
so that on the average, the Risk will be as small as
possible.
The Risk (R) is the expected value of cost
Minimizing (R) leads to complex boundaries
generalization versus minimum risk?

10
In order to achieve general purpose recognizer
(unbiased) we should have a sufficient number of
training samples (N) for each class in the data set.
A theoretical estimate claims that
N ≅ 100 / P where P ≡ prob. of misclassification
I.e., for P ≈ 0.01, N ≈ 10000 and for P ≈ 0.03, N ≈ 3000
Such large data set (if available) needs large storage
and long processing time (time complexity)
generalization versus complexity?

11
Paper ObjectivePaper Objective
Our concern in this paper is to:
1. provide a comprehensive review of recent off-line
and on-line trends in Arabic cursive handwriting
recognition (last 10 years publications)
2. clarify the challenges standing against obtaining a
reliable, accurate, simple, general purpose recognizer
based on these trends.

12
Arabic Handwriting Recognition ProblemArabic Handwriting Recognition Problem
Arabic Script Recognition Systems are categorized as:
1. On-line or Off-line
2. Writer Dependent or Writer Independent
3. Open-vocabulary or closed-vocabulary

13
Types of Recognition:
When the input device is a digitizer tablet that
transmits the signal in real time or includes timing
information together with pen position, this is mostly
referred to as on-line or dynamic recognition

14
Types of Recognition:
When the input device is a still camera or a scanner,
which captures the position of digital ink on the page
but not the order in which it was laid down, this is
defined as off-line or image-based OCR

15
Special Characteristics of Arabic Script:
Always written from right to left
Arabic word consists of one or more portions; each
has one or more characters
Many characters differ only by the position and the
number of dots attached

16
Every character has more than one shape, depending
on its position
Characters overlap

17
Existence of ligatures
Due to having these special characteristics, Arabic
handwriting recognition systems still need more research
to be established commercially

18
Main ChallengesMain Challenges
Feature Extraction
Noise
Model Selection and Complexity
Segmentation
Context
Evidence Pooling
Costs and Risks
Computational Complexity
Learning and Adaptation

19
Feature Extraction:
A good feature set should helps distinguishing a class
from other classes, be invariant to differences and
contains no redundant information

20
Feature Extraction:
… How to know which features are most
promising ?
… Is there ways to automatically learn which features are
best for a classifier?

21
Feature Extraction:
promising ?
It should be limited in number for computational ease
and to limit the amount of training data

22
Feature Extraction:
promising ?
It should be limited in number for computational ease
and to limit the amount of training data
… How many features
to use?

23
Noise:
Random error in a pixel value (deformation) due to
signal-independent, signal-dependent and salt &
pepper noise.
Noise cannot always be totally eliminated; but
smoothing is done

24
Noise:
Random error in a pixel value (deformation) due to
signal-independent, signal-dependent and salt &
pepper noise.
Noise cannot always be totally eliminated; but
smoothing is done
… Is the deformation in some signal is noise? or natural
varieties in true models?
… How can we use this information to improve
our classifier?

25
Modeling Selection and Complexity:
Determining the complexity of the model: not so
simple that it cannot explain the differences between
the categories, yet not so complex as to give poor
classification on novel patterns.

26
Modeling Selection and Complexity:
Determining the complexity of the model: not so
simple that it cannot explain the differences between
the categories, yet not so complex as to give poor
classification on novel patterns.
… how to know when to reject a class of models and
try another one?
… Are there principled methods for finding the best
complexity for a classifier?
… Is it a matter of random trial & error not even guided by
expectations of performance?

27
Segmentation:
Segmentation subdivides image into its constituent
regions or objects. Segmentation should stop when the
objects of interest in an application have been isolated.

28
Segmentation:
Segmentation subdivides image into its constituent
regions or objects. Segmentation should stop when the
objects of interest in an application have been isolated.
… How do we know where one character “ends” and the
next one “begin”?
… Shall we segment the images before they have been categorized or
categorize them
before they have been segmented?

29
Context:
The accuracy of automatic handwriting recognition
systems based on purely visual information seems to
have a ceiling
Incorporating Symantec and syntactic knowledge
sources into the automatic recognition of text can offer
potential improvements in performance
… how, precisely, should we incorporate such
information?

30
Evidence Pooling:
For high classification performance or for increased
class coverage, different classification tools are
developed either in parallel or sequentially
When having several component classifiers, and
these categorizers agree on a particular pattern, there
is no difficulty. But suppose they disagree !!!

31
Evidence Pooling:
For high classification performance or for increased
class coverage, different classification tools are
developed either in parallel or sequentially
When having several component classifiers, and
these categorizers agree on a particular pattern, there
is no difficulty. But suppose they disagree !!!
… How should a “super” classifier pool the evidence from the component
recognizers to achieve the best decision?
… How would the “super” categorizer know when to base a decision on
a minority opinion when required?

32
Costs and Risks:
A classifier is generally used to recommend actions,
each action having an associated cost or risk
We often design our classifier to recommend actions
that minimize some total expected cost or risk

33
Costs and Risks:
A classifier is generally used to recommend actions,
each action having an associated cost or risk
We often design our classifier to recommend actions
that minimize some total expected cost or risk
… How do we incorporate knowledge about such risks and how will they
affect the classification decision?
… Is there a way to estimate the total risk and thus tell whether our
classifier is acceptable even before we field it?

34
Computational Complexity:
Although we might achieve error-free recognition, the
time & storage requirements would be quite prohibitive
Some pattern recognition problems can be solved
using algorithms that are highly impractical.

35
Computational Complexity:
Although we might achieve error-free recognition, the
time & storage requirements would be quite prohibitive
Some pattern recognition problems can be solved
using algorithms that are highly impractical.
… What is the tradeoff between computational ease
and performance?
… How can we optimize an excellent recognizer within the
engineering constraints ?

36
Learning and Adaptation:
Any method that incorporates information from training
samples in the design of a classifier employs learning
If the models were extremely complicated, the classifier
would have complex decision boundaries
To overcome this, more training samples are needed to
obtain a better estimate of the true underlying features
In case of limited training samples, we should incorporate
knowledge of the problem domain. The production
representation is the “best” representation for classification.

37
Learning and Adaptation:
Any method that incorporates information from training
samples in the design of a classifier employs learning
If the models were extremely complicated, the classifier
would have complex decision boundaries
To overcome this, more training samples are needed to
obtain a better estimate of the true underlying features
In case of limited training samples, we should incorporate
knowledge of the problem domain. The production
representation is the “best” representation for classification.
… How much training samples are needed for good generalization?
… How can we insure that the learning algorithm favors “simple”
solutions rather than complicated ones?

38
Recent off-line Arabic handwriting recognitionRecent off-line Arabic handwriting recognition
systemssystems
Example: Pechwitz et al research [17]
proposed a recognition system based on a semi-continuous 1-D
HMM using the IFN/ENIT database of handwritten Tunisian
town/village names.
Preprocessing:
1. Extracting image contour and Performing a noise reduction filtering.
2. Skeletonization and normalization are performed.
3. Baseline estimation and word length normalization are performed.

39
systemssystems
Feature Extraction:
1. A rectangular window is shifted from right to left across the
normalized gray level script image .
2. A Loeve-Karhunen Transformation is performed on the gray values
of each frame to reduce the number of features.
Modeling:
1. A HMM-model is generated for each character shape (all possible
positions) up to 160 different HMM-models.
2. Semi Continuous HMMs are used with 7 states per character.

40
systemssystems
Database:
1. This database is split into four sets A, B, C & D.
2. The 4 sets contain 26,459 images of segmented Tunisian town
names (115,585 PAWs) handwritten by 411 unique writers.
3. 946 unique word labels, and 762 unique PAW labels.
4. For each image the ground truth information is available.
Lexicon:
The character shape HMM-models are combined to valid word
models using a tree structured lexicon with all 946 different

41
systemssystems
Recognition:
The standard Viterbi Algorithm is used together with the lexicon.
The authors applied the recognition algorithm to the database
twice, once using the baseline coming from GT (ground truth) and
once using baseline they estimated.
Results:
Recognition rates 82 – 89% are obtained using baseline estimation
Recognition rates 89 – 95% are obtained using GT baseline

42
systemssystems
Challenges:
1. Working on available database skips the limited training samples challenge

43
systemssystems
Challenges:
2. It is not easy to generalize this classifier for open vocabulary applications
because it works on a limited lexicon of words (segmentation-free
recognizer) otherwise context will be a must.

44
systemssystems
Challenges:
3. Generating the same HMM structure for all characters and ligatures i.e.,
modeling selection & complexity .. we think it would be much better to vary
the model structure according to each character requirement (‫ض‬ shouldn’t
have the same model as ‫ة‬ for example).

45
systemssystems
Challenges:
3. Generating the same HMM structure for all characters and ligatures i.e.,
modeling selection & complexity .. we think it would be much better to vary
the model structure according to each character requirement (‫ض‬ shouldn’t
have the same model as ‫ة‬ for example).
4. Feature Extraction: The idea of normalizing the word width to use a sliding
window feature extractor is pretty good except for the great dependency on

46
Recent on-line Arabic handwriting recognitionRecent on-line Arabic handwriting recognition
systemssystems
Example: Biadsy et al research [24]
Preprocessing:
1. Geometrical processing phase to minimize handwriting variations.
2. A low-pass filter is used to reduce noise and remove imperfections
caused by acquisition devices.
3. The writing-speed is normalized by re-sampling the consequent
point sequences.
Feature Extraction:
Mainly angles (with x-axis) and loop-presence

47
systemssystems
Modeling:
1. The recognition framework uses discrete Left-to-right HMMs to
represent each Arabic letter shape (isolated, initial, medial, and
final).
2. The number of states for each letter shape model is based on the
geometric complexity of the letter shape. It varies from 5 to 11
states.
For example: 11 states are assigned to isolated ‫,ش‬ and 5 states to
isolated ‫.أ‬

48
systemssystems
Lexicon:
1. The Arabic dictionary D is subdivided into a set of sub-dictionaries {D1, D2,
…, Dn} based on the number of word parts in each word.
2. Letter-shape models are embedded in a network that represents a word-
part dictionary. The segmentation of word parts into letter-shapes and their
recognition are performed simultaneously in an integrated process.
D = {D = {‫انسان‬ ،‫التحدى‬ ،‫ثقافة‬ ،‫جامعة‬ ،‫رواية‬ ،‫فادى‬ ،‫محمد‬ ،‫محمود‬ ،‫معلم‬ ،‫هل‬ ،‫وسام‬‫انسان‬ ،‫التحدى‬ ،‫ثقافة‬ ،‫جامعة‬ ،‫رواية‬ ،‫فادى‬ ،‫محمد‬ ،‫محمود‬ ،‫معلم‬ ،‫هل‬ ،‫وسام‬}}
Sub-dictionaries of DSub-dictionaries of D Word-Part Dictionary for D3Word-Part Dictionary for D3
D1 = {D1 = {‫محمد‬ ،‫معلم‬ ،‫هل‬‫محمد‬ ،‫معلم‬ ،‫هل‬}}
D2 = {D2 = {‫ثقافة‬ ،‫جامعة‬ ،‫محمود‬‫ثقافة‬ ،‫جامعة‬ ،‫محمود‬}}
D3 = {D3 = {‫انسان‬ ،‫التحدى‬ ،‫فادى‬ ،‫وسام‬‫انسان‬ ،‫التحدى‬ ،‫فادى‬ ،‫وسام‬}}
D4 = {D4 = {‫رواية‬‫رواية‬}}
WPD3,1 = {WPD3,1 = {‫ا‬ ،‫فا‬ ،‫و‬‫ا‬ ،‫فا‬ ،‫و‬}}
WPD3,2 = {WPD3,2 = {‫نسا‬ ،‫لتحد‬ ،‫د‬ ،‫سا‬‫نسا‬ ،‫لتحد‬ ،‫د‬ ،‫سا‬}}
WPD3,3 = {WPD3,3 = {‫ن‬ ،‫ى‬ ،‫م‬‫ن‬ ،‫ى‬ ،‫م‬}}

49
systemssystems
Database:
1. 4 trainers are asked to write 800 selected words each.
2. For testing, 10 testers (the 4 trainers, in addition to 6 new volunteers) are
asked to write 280 words not in the training data (2,358 words in total).
3. 5 different dictionary sizes (5K, 10K, 20K, 30K, and 40K words) selected
from different Arabic websites are used. The 280 test words are present in
all dictionary sizes.
Recognition:
Writer dependent (WD) and writer independent (WI) experiments are done
and average word recognition rates 88 – 96% are obtained. The

50
systemssystems
Challenges:
1. Feature Extraction: The features they use are not enough to lead to
satisfying classification of general unconstrained handwritings.
Thus they are in a great need to work under limited vocabulary.
The word parts must be present in the dictionary or the will not be
recognized.

51
systemssystems
Challenges:
1. Feature Extraction: The features they use are not enough to lead to
satisfying classification of general unconstrained handwritings.
Thus they are in a great need to work under limited vocabulary.
The word parts must be present in the dictionary or the will not be
recognized.
2. Database they use looks unnatural. Volunteers are asked to follow
restrict methodology of writing which affects their individual writing
style. Besides, the system handles limited handwriting varieties
due to the small number of volunteers who wrote the database.

52
Summary and ConclusionSummary and Conclusion
Foreign recognizers have found their way to the
markets as commercial products since years while
Arabic recognizers still need more time.

53
in the case of Arabic handwritten words many
researchers use a specific, more or less small data set
of their own ∴ it is impossible to compare different
results which would be important to improve existent
methods

54
in the case of Arabic handwritten words many
researchers use a specific, more or less small data set
of their own ∴ it is impossible to compare different
results which would be important to improve existent
methods
The complexity of the problem is greatly increased by
noise and by the infinite variability of handwritings

55
Cursive script requires the segmentation of words in
characters or parts of characters, i.e. graphemes, and
then the detection of individual features.

56
Generally, the holistic approach can be used if the
size of the vocabulary is small (such as the recognition
of the legal amount in cheques)

57
Generally, the holistic approach can be used if the
size of the vocabulary is small (such as the recognition
of the legal amount in cheques)
The character-based approach is the preferred
method for recognition applications that are
unconstrained or involve large-size vocabularies to
insure good generalization together with reasonable
complexity

Arabic Handwritten Script Recognition Towards Generalization: A Survey

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Arabic Handwritten Script Recognition Towards Generalization: A Survey

Similar to Arabic Handwritten Script Recognition Towards Generalization: A Survey (20)

More from Randa Elanwar

More from Randa Elanwar (20)

Recently uploaded

Recently uploaded (20)

Arabic Handwritten Script Recognition Towards Generalization: A Survey