Presentation OrganizationPresentation Organization
1. Introduction
2. Document Analysis and Character
Recognition
3. Objective
١
3. Objective
4. Rule-based Algorithm for Off-line Isolated
Handwritten character recognition
5. Rule-based Algorithm for On-line Cursive
Handwriting Segmentation and Recognition
6. Summary, Conclusion and Future work
٢
Prepared by:Prepared by:
Eng. Randa Ibrahim M. ElanwarEng. Randa Ibrahim M. Elanwar
Research assistant , Electronic Research Institute
Under the supervision of:Under the supervision of:
Prof. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. MashalyProf. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. Mashaly
Professor of Digital Signal Processing, Head of computers and systems dept,
Faculty of Engineering, Cairo University Electronic Research Institute
Presentation OrganizationPresentation Organization
1. Introduction
2. Document Analysis and Character
Recognition
3. Objective
٣
3. Objective
4. Rule-based Algorithm for Off-line Isolated
Handwritten character recognition
5. Rule-based Algorithm for On-line Cursive
Handwriting Segmentation and Recognition
6. Summary, Conclusion and Future work
٤
IntroductionIntroduction
The Motivation of DocumentThe Motivation of Document
Analysis and Recognition (DAR) &Analysis and Recognition (DAR) &
Character Recognition (CR)Character Recognition (CR)
٥
Character Recognition (CR)Character Recognition (CR)
research fieldsresearch fields
Arabic Character RecognitionArabic Character Recognition
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition
Facilities of using documents in computerized formatFacilities of using documents in computerized format
11. Easy editing. Easy editing
٦
11. Easy editing. Easy editing
22. High quality hard copies. High quality hard copies
33. Quick distribution across world. Quick distribution across world--wide networkswide networks
44. Key word or pattern searching. Key word or pattern searching
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition .. (cont’d)
Trillions of old documents, handwritten notes,Trillions of old documents, handwritten notes,
forms or drawings, that are still not informs or drawings, that are still not in
٧
forms or drawings, that are still not informs or drawings, that are still not in
computerized format.computerized format.
The manual process used to enter the dataThe manual process used to enter the data
from these documents into computers demandsfrom these documents into computers demands
a great deal of time and money.a great deal of time and money.
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition .. (cont’d)
The general objective of DAR research is toThe general objective of DAR research is to
fully automate the process of understandingfully automate the process of understanding
٨
fully automate the process of understandingfully automate the process of understanding
printed or handwritten data and entering it toprinted or handwritten data and entering it to
the computer.the computer.
The Optical Character Recognition (OCR) isThe Optical Character Recognition (OCR) is
the subthe sub--field of document analysis concernedfield of document analysis concerned
with the recognition of machine printed orwith the recognition of machine printed or
handwritten characters in a document.handwritten characters in a document.
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition .. (cont’d)
With the advent of a Personal Digital AssistantWith the advent of a Personal Digital Assistant
(PDA) there is a great need for handwriting(PDA) there is a great need for handwriting
٩
(PDA) there is a great need for handwriting(PDA) there is a great need for handwriting
recognition.recognition.
The problem of recognizing writing in case ofThe problem of recognizing writing in case of
handwritten scanned document images ishandwritten scanned document images is
referred to as offreferred to as off--line handwriting recognition.line handwriting recognition.
The problem of recognizing writing in case ofThe problem of recognizing writing in case of
PDAs is referred to as onPDAs is referred to as on--line handwritingline handwriting
recognition.recognition.
IntroductionIntroduction
Arabic Character Recognition
Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:
Always written from right to left.Always written from right to left.
١٠
Arabic word consists of one or more portions;Arabic word consists of one or more portions;
each has one or more characters.each has one or more characters.
Many characters differ only by the position andMany characters differ only by the position and
the number of dots attached.the number of dots attached.
IntroductionIntroduction
Arabic Character Recognition .. (cont’d)
Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:
Every character has more than one shape,Every character has more than one shape,
depending on its position.depending on its position.
١١
depending on its position.depending on its position.
Characters overlap.Characters overlap.
IntroductionIntroduction
Arabic Character Recognition .. (cont’d)
Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:
Existence of Ligatures.Existence of Ligatures.
١٢
As a result of encountering these specialAs a result of encountering these special
characteristics, Arabic character recognitioncharacteristics, Arabic character recognition
systems still need more research to besystems still need more research to be
established commercially.established commercially.
١٣
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
OffOff--line Document Analysis & CRline Document Analysis & CR
PreprocessingPreprocessing
FeaturesFeatures
OnOn--line Document Analysis & CRline Document Analysis & CR
١٤
OnOn--line Document Analysis & CRline Document Analysis & CR
PreprocessingPreprocessing
FeaturesFeatures
SegmentationSegmentation
Learning and ClassificationLearning and Classification
The DACR field is subdivided to:The DACR field is subdivided to:
11. Off. Off--line Document Analysis & CRline Document Analysis & CR
ApplicationsApplications: Bank check processing, Mail sorting,: Bank check processing, Mail sorting,
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
١٥
ApplicationsApplications: Bank check processing, Mail sorting,: Bank check processing, Mail sorting,
Reading of commercial forms, etcReading of commercial forms, etc
22. On. On--line Document Analysis & CRline Document Analysis & CR
ApplicationsApplications: Pen computing industry, Signature verification,: Pen computing industry, Signature verification,
Author authenticationAuthor authentication
1. Off-line Document Analysis &CR
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
١٦
1. Off-line Document Analysis & CR ..
(cont’d)
1.1 Preprocessing
BinarizationBinarization
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
١٧
BinarizationBinarization
Noise removalNoise removal
NormalizationNormalization
Morphological image processing: Opening, Closing,Morphological image processing: Opening, Closing,
Erosion, Dilation, etc.Erosion, Dilation, etc.
Segmentation: Explicit, Implicit, segmentationSegmentation: Explicit, Implicit, segmentation--freefree
1. Off-line Document Analysis & CR ..
(cont’d)
1.2 Features
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
١٨
Structural DecompositionStructural Decomposition
(Height contour and chain code features, End points,
T-joints and X-joints)
Series ExpansionSeries Expansion
(Moments, Fourier Transform, Gabor Transform and
Wavelets)
2. On-line Document Analysis & CR
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
١٩
2. On-line Document Analysis & CR ..
(cont’d)
2.1 Preprocessing
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٠
Noise removalNoise removal
(Smoothing, Filtering, De(Smoothing, Filtering, De--hooking, etc)hooking, etc)
NormalizationNormalization
(Slant correction, Baseline drift correction, Scale normalization, etc)(Slant correction, Baseline drift correction, Scale normalization, etc)
SegmentationSegmentation
(Explicit, Implicit, Segmentation(Explicit, Implicit, Segmentation--free)free)
2. On-line Document Analysis & CR ..
(cont’d)
2.2 Features
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢١
Features are typically extracted at a subFeatures are typically extracted at a sub--letter level:letter level:
Shape DescriptorsShape Descriptors
(Ascender, descender, concavity, loop, cusp,
curliness, lineness)
Tangent and curvature features for a window ofTangent and curvature features for a window of
pointspoints
Writing SpeedWriting Speed
Segmentation
Segmentation based on contour analysis andSegmentation based on contour analysis and
baseline locationbaseline location
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٢
Segmentation based on vertical histogramSegmentation based on vertical histogram
Stroke SegmentationStroke Segmentation
PostPost-- Segmentation (Segmentation by recognition)Segmentation (Segmentation by recognition)
Segmentation by Neural NetworkSegmentation by Neural Network
Segmentation using Dynamic programming (PreSegmentation using Dynamic programming (Pre--
stroke segmentation)stroke segmentation)
Segmentation .. (cont’d)
Segmentation based on contour analysis andSegmentation based on contour analysis and
baseline locationbaseline location
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٣
The chain code providesThe chain code provides
information about findinginformation about finding
the baseline location.the baseline location.
After defining the baselineAfter defining the baseline
location, segmentation islocation, segmentation is
done at the points wheredone at the points where
contour makes transitioncontour makes transition
from the inside to thefrom the inside to the
outside of the baseline.outside of the baseline.
Segmentation .. (cont’d)
Stroke SegmentationStroke Segmentation
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٤
Segmentation .. (cont’d)
Segmentation based on vertical histogramSegmentation based on vertical histogram
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٥
After plotting the vertical histogram of the word or subAfter plotting the vertical histogram of the word or sub--
word, it is traversed by a predefined threshold.word, it is traversed by a predefined threshold.
The zones above this threshold are isolated.The zones above this threshold are isolated.
This threshold value depends on the font, and isThis threshold value depends on the font, and is
proportional to the lump of black pixels that joinsproportional to the lump of black pixels that joins
characters togethercharacters together
Segmentation .. (cont’d)
PostPost-- Segmentation (Segmentation (Segmentation by recognitionSegmentation by recognition))
The basic idea is to extract sequentially a set ofThe basic idea is to extract sequentially a set of
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٦
The basic idea is to extract sequentially a set ofThe basic idea is to extract sequentially a set of
features and accumulating the values while movingfeatures and accumulating the values while moving
along the word. then checked against the featurealong the word. then checked against the feature
space of a given font.space of a given font.
This process is repeated until the character isThis process is repeated until the character is
recognized or the end of the word is reached.recognized or the end of the word is reached.
Segmentation .. (cont’d)
Segmentation by Neural NetworkSegmentation by Neural Network
Neural Networks are trained on manually markedNeural Networks are trained on manually marked
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٧
Neural Networks are trained on manually markedNeural Networks are trained on manually marked
break points.break points.
For the test words, Neural Networks will have toFor the test words, Neural Networks will have to
determine the location of break points betweendetermine the location of break points between
characters.characters.
Segmentation .. (cont’d)
Segmentation using Dynamic programmingSegmentation using Dynamic programming
(Pre(Pre--stroke segmentation)stroke segmentation)
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٨
Valley points (festoonValley points (festoon--like strokes) usually correspond tolike strokes) usually correspond to
segmentation points between characters.segmentation points between characters.
The basic idea is to use a dynamic programmingThe basic idea is to use a dynamic programming
algorithm to find a globally optimal set of cuts throughalgorithm to find a globally optimal set of cuts through
the input string which minimizes a certain cost function.the input string which minimizes a certain cost function.
The set of cuts and their precise shape are foundThe set of cuts and their precise shape are found
simultaneously.simultaneously.
Learning (Training)
Supervised LearningSupervised Learning
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٢٩
Unsupervised LearningUnsupervised Learning
Reinforcement LearningReinforcement Learning
Learning (Training) .. (cont’d)
Supervised LearningSupervised Learning
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٣٠
A teacher provides a category label or cost for eachA teacher provides a category label or cost for each
pattern in a training setpattern in a training set
Unsupervised LearningUnsupervised Learning
There is no explicit teacher, and the system formsThere is no explicit teacher, and the system forms
clusters or “natural groupings” of the input patterns.clusters or “natural groupings” of the input patterns.
Learning (Training) .. (cont’d)
Reinforcement LearningReinforcement Learning
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٣١
This is analogous to a critic who merely states thatThis is analogous to a critic who merely states that
something is right or wrong, but does not saysomething is right or wrong, but does not say
specifically how it is wrong.specifically how it is wrong.
(Thus only binary feedback is given to the classifier)(Thus only binary feedback is given to the classifier)
Classification (Recognition)
Classification ApproachesClassification Approaches
11. Holistic Approach. Holistic Approach
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٣٢
11. Holistic Approach. Holistic Approach
Segmentation free, Closed Vocabulary, Global featuresSegmentation free, Closed Vocabulary, Global features
22. Analytical Approach. Analytical Approach
Implicit or Explicit Segmentation, Open VocabularyImplicit or Explicit Segmentation, Open Vocabulary
Classification (Recognition) .. (cont’d)
Classification ToolsClassification Tools
11. Template Matching. Template Matching
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٣٣
11. Template Matching. Template Matching
(Direct matching, string matching and elastic matching)(Direct matching, string matching and elastic matching)
22. Statistical Methods. Statistical Methods
(k nearest neighbour, Bayesian Classifier)(k nearest neighbour, Bayesian Classifier)
33. Stochastic Processes. Stochastic Processes
(Markov Chain)(Markov Chain)
Classification (Recognition) .. (cont’d)
Classification ToolsClassification Tools
44. Structural Matching. Structural Matching
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٣٤
44. Structural Matching. Structural Matching
(Trees, Chains, etc)(Trees, Chains, etc)
55. Neural Networks. Neural Networks
66. Rule. Rule--based Methodsbased Methods
(Abstract description of writing)(Abstract description of writing)
77. Multiple Classifiers. Multiple Classifiers
(Classifier Ensemble)(Classifier Ensemble)
OnOn--line and Offline character recognitionline and Offline character recognition
systems can be categorized as:systems can be categorized as:
11. Recognition of Isolated Characters (. Recognition of Isolated Characters (ISRISR).).
Document Analysis andDocument Analysis and
Character RecognitionCharacter Recognition
٣٥
22. Explicit Segmentation into characters/primitives Before. Explicit Segmentation into characters/primitives Before
Recognition (Recognition (SBRSBR).).
33. Simultaneous / Sequential recognition and segmentation. Simultaneous / Sequential recognition and segmentation
((SSRSSR).).
44. Global Whole Word recognition (. Global Whole Word recognition (GWRGWR).).
٣٦
ObjectiveObjective
11. Viewing the ACR problem from different sides:. Viewing the ACR problem from different sides:
Isolated and cursiveIsolated and cursive
OffOff--line and online and on--line character problemline character problem
Single writer and multiSingle writer and multi--writer variabilitywriter variability
(WD & WI)(WD & WI)
٣٧
(WD & WI)(WD & WI)
22. Achieving the best possible character. Achieving the best possible character
recognition accuracy using the most logicalrecognition accuracy using the most logical
rulerule--based algorithmsbased algorithms
٣٨
A. System Stages
11. Database Collection. Database Collection
22. Preprocessing. Preprocessing
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٣٩
33. Feature Extraction, Learning & Classification. Feature Extraction, Learning & Classification
33..11) A single feature) A single feature--based classifierbased classifier
systemsystem
33..22) Hierarchical Mixture of feature) Hierarchical Mixture of feature--basedbased
classifiers systemclassifiers system
B. Results and Discussion
1. Database Collection:
A database for a single writer consisted ofA database for a single writer consisted of 3030
samples (samples (2020 for training andfor training and 1010 for test) of thefor test) of the
Arabic alphabetic characters were used. i.e.Arabic alphabetic characters were used. i.e. 580580
characters for training andcharacters for training and 290290 for testfor test
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٠
characters for training andcharacters for training and 290290 for testfor test
2. Preprocessing:
Character Image BinarizationCharacter Image Binarization
Character Image ThresholdingCharacter Image Thresholding
3. Feature Extraction, Learning and
Classification:
Recognition results were based upon theRecognition results were based upon the
comparison between:comparison between:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤١
11. A single feature. A single feature--based classifier systembased classifier system
22. Hierarchical Mixture of feature. Hierarchical Mixture of feature--based classifiersbased classifiers
systemsystem
33..11) A single feature) A single feature--based classifier systembased classifier system
The feature used for this single classifier systemThe feature used for this single classifier system
was mainly the radial distanceswas mainly the radial distances
3.1) A single feature-based classifier
system:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٢
In the training stage, we compute a representative pattern forIn the training stage, we compute a representative pattern for
each classeach class
Each character was considered a separate classEach character was considered a separate class
Classification using the Euclidean distance measureClassification using the Euclidean distance measure
3.1) A single feature-based classifier
system: .. (cont’d)
The average system accuracy =The average system accuracy = 7070..0606%%
Most of the confusions lack sense. This isMost of the confusions lack sense. This is
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٣
Most of the confusions lack sense. This isMost of the confusions lack sense. This is
because:because:
The input pattern is compared to all classes.The input pattern is compared to all classes.
One feature is not representative enough.One feature is not representative enough.
We need a better way of categorizationWe need a better way of categorization
We need to Acquire more featuresWe need to Acquire more features
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
Character images are composed ofCharacter images are composed of 11,, 22,, 33 oror 44 objectsobjects
Example:Example:
٤٤
We have a main object (character body) and secondaries.We have a main object (character body) and secondaries.
To determine the number of dots associated we need toTo determine the number of dots associated we need to
discriminate between:discriminate between:
1.1. Single dotSingle dot
2.2. Two stuck dotsTwo stuck dots
3.3. HamzaHamza
4.4. Separated AlefSeparated Alef
3.2) Hierarchical Mixture of feature-
based classifiers system
The recognition stage in our proposed system had passed byThe recognition stage in our proposed system had passed by
44 stages:stages:
StageStage 11:: using classifier ensemble (hierarchical mixture ofusing classifier ensemble (hierarchical mixture of
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٥
StageStage 11:: using classifier ensemble (hierarchical mixture ofusing classifier ensemble (hierarchical mixture of
experts) gated by using dotsexperts) gated by using dots
StageStage 22:: Adding more structural features for gatingAdding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
StageStage 33:: Adding more features and using feature fusionAdding more features and using feature fusion
StageStage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 11:: using classifier ensemble (hierarchicalusing classifier ensemble (hierarchical
mixture of experts) gated by using dotsmixture of experts) gated by using dots
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٦
Characters are clustered into groups according to theCharacters are clustered into groups according to the
number of dots attached to them to work as gatingnumber of dots attached to them to work as gating
between redundant classifiers.between redundant classifiers.
The same feature is used for recognition in eachThe same feature is used for recognition in each
cluster. i.e., we now have acluster. i.e., we now have a classifier ensemble ofclassifier ensemble of
individual classifiers (individual classifiers (by varying training databy varying training data).).
Classification using the Euclidean distance measureClassification using the Euclidean distance measure
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 11:: using classifier ensemble (hierarchicalusing classifier ensemble (hierarchical
mixture of experts) gated by using dotsmixture of experts) gated by using dots
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٧
The average system accuracy =The average system accuracy = 7878..3333%%
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 22:: Adding more structural features for gatingAdding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٨
Characters are clustered into groups according to theCharacters are clustered into groups according to the
number of dots attached to them and the existence ofnumber of dots attached to them and the existence of
loops and Hamzas: (loops and Hamzas: (88 different classifiers).different classifiers).
The same feature is used for recognition in eachThe same feature is used for recognition in each
cluster.cluster.
Classification using the Euclidean distance measureClassification using the Euclidean distance measure
The average system accuracy has risen to beThe average system accuracy has risen to be 8080..8686%%
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 22:: Adding more structural features for gatingAdding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٤٩
New Structural features are added:New Structural features are added:
Number and position of the character stroke endNumber and position of the character stroke end
pointspoints
Number of vertical and horizontal lines cuts by theNumber of vertical and horizontal lines cuts by the
character bodycharacter body
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 22:: Adding more structural features for gatingAdding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٠
The average system accuracy =The average system accuracy = 9292..2525%%
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 22:: Adding more structural features for gatingAdding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥١
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 33:: Adding more features and using featureAdding more features and using feature
fusionfusion
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٢
A New featureA New feature--based classifier that usesbased classifier that uses 4545°° inclined linesinclined lines
cuts feature is addedcuts feature is added
We used a fusion technique,We used a fusion technique, weighted averageweighted average, to, to
combine together different featurescombine together different features
The average system accuracy has risen to beThe average system accuracy has risen to be 9696%%
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 33:: Adding more features and using featureAdding more features and using feature
fusionfusion
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٣
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
We raised the secondaries identification accuracy toWe raised the secondaries identification accuracy to 9999..77%%
using some structural features:using some structural features:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٤
using some structural features:using some structural features:
Character Body to Secondary Ratio,Character Body to Secondary Ratio,
Secondary Black to white pixel ratio, andSecondary Black to white pixel ratio, and
Secondary height to width ratio.Secondary height to width ratio.
We removed class overlapping in the feature spaceWe removed class overlapping in the feature space
The average system accuracy has risen to beThe average system accuracy has risen to be 9797%%
3.2) Hierarchical Mixture of feature-
based classifiers system .. (cont’d)
StageStage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٥
Results and Discussion
The system stages followed to end up with:The system stages followed to end up with:
11. Average recognition accuracy of. Average recognition accuracy of 9797%%
22. The total increase in the recognition accuracy is about. The total increase in the recognition accuracy is about
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٦
22. The total increase in the recognition accuracy is about. The total increase in the recognition accuracy is about
2727% from the recognition accuracy achieved by a single% from the recognition accuracy achieved by a single
classifier systemclassifier system
33. We were able to achieve high results using the most. We were able to achieve high results using the most
common features by proposing the idea of multiplecommon features by proposing the idea of multiple
classifier system (classifier ensemble) besides using aclassifier system (classifier ensemble) besides using a
classification hierarchy based on the structural features ofclassification hierarchy based on the structural features of
Arabic characters.Arabic characters.
Results and Discussion
Our system is very simple and the results areOur system is very simple and the results are
comparable to those obtained by other researchers:comparable to those obtained by other researchers:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
٥٧
Results and Discussion
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated
Handwritten character recognitionHandwritten character recognition
70.06
78.33
92.25 96 97
40
60
80
100
AverageAccuracy__
٥٨
0
20
AverageAccuracy__
Single
Classifier
Stage 1 Stage 2 Stage 3 Stage 4
٥٩
Classically [Classically [1111], on], on--line recognizers consist of:line recognizers consist of:
11. A preprocessor. A preprocessor
22. A classifier which provides estimates of. A classifier which provides estimates of
probabilities for the different categories ofprobabilities for the different categories of
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٠
probabilities for the different categories ofprobabilities for the different categories of
characters andcharacters and
33. A postprocessor, which eventually incorporates. A postprocessor, which eventually incorporates
a language modela language model
We propose a ruleWe propose a rule--based algorithm for the two earlybased algorithm for the two early
stages of an onstages of an on--line recognizer cursive Arabicline recognizer cursive Arabic
handwritinghandwriting
A. System Stages
11. Database Collection. Database Collection
22. Preprocessing. Preprocessing
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦١
33. Pattern Shapes Definition. Pattern Shapes Definition
44. Feature Extraction. Feature Extraction
55. Training. Training
66. Recognition. Recognition
B. Results and Discussion
1. Database Collection
Handwritten documents were collected on aHandwritten documents were collected on a
slate tablet PCslate tablet PC
The Database collected was unconstrainedThe Database collected was unconstrained
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٢
The Database collected was unconstrainedThe Database collected was unconstrained
(open vocabulary)(open vocabulary)
No digits included.No digits included.
Writing is in NASKH font onlyWriting is in NASKH font only
2. Preprocessing
Filter the document and clear it from unintendedFilter the document and clear it from unintended
writers' errors.writers' errors.
Break down the document into text lines andBreak down the document into text lines and
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٣
Break down the document into text lines andBreak down the document into text lines and
words or subwords or sub--words.words.
Detect the type of each stroke (either mainDetect the type of each stroke (either main--bodybody
or secondary).or secondary).
2. Preprocessing .. (cont’d)
Filter the document and clear it from unintendedFilter the document and clear it from unintended
writers' errors.writers' errors.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٤
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
The two problems that face using xThe two problems that face using x--y axes projectiony axes projection
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٥
The two problems that face using xThe two problems that face using x--y axes projectiony axes projection
histograms:histograms:
11. The base line skewing that makes line separation difficult. The base line skewing that makes line separation difficult
and needs careful skew detection and correction stage.and needs careful skew detection and correction stage.
22. The multi. The multi--word overlap where the interword overlap where the inter--word distance isword distance is
smaller than the normal expected threshold for separatingsmaller than the normal expected threshold for separating
words.words.
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
E. Ratzlaff used a “bottomE. Ratzlaff used a “bottom--up” clustering of discrete strokesup” clustering of discrete strokes
into increasingly larger groups that eventually merge tointo increasingly larger groups that eventually merge to
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٦
into increasingly larger groups that eventually merge tointo increasingly larger groups that eventually merge to
complete text lines.complete text lines.
The initial bottomThe initial bottom--up clustering began by creating Forwardup clustering began by creating Forward
Projection (FP) groups.Projection (FP) groups.
Strokes were merged into FP groups if they have stronglyStrokes were merged into FP groups if they have strongly
overlapping Yoverlapping Y--axis projections. A single unmerged strokeaxis projections. A single unmerged stroke
became an independent FPbecame an independent FP
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
Drawbacks:Drawbacks:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٧
11. The secondaries usually have null overlapping Y. The secondaries usually have null overlapping Y--axisaxis
projectionsprojections
22. Large base line skews among the text line and even within. Large base line skews among the text line and even within
one word.one word.
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
Another idea for text line separation was expressed by GarethAnother idea for text line separation was expressed by Gareth
Loudon et al. This was successfully working with EnglishLoudon et al. This was successfully working with English
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٨
Loudon et al. This was successfully working with EnglishLoudon et al. This was successfully working with English
script due to limited cursive nature, i.e. the stroke (penscript due to limited cursive nature, i.e. the stroke (pen
down/up movement) usually represents a single character.down/up movement) usually represents a single character.
Several parameters were calculated for each stroke during theSeveral parameters were calculated for each stroke during the
character segmentation step.character segmentation step.
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٩
if (si > max(xi)) or (if (si > max(xi)) or (--si >si > 22* max(xi) & yi > max(xi)),* max(xi) & yi > max(xi)),
then stroke i was a character at the end of a word,then stroke i was a character at the end of a word,
else if ( ci >else if ( ci > 00))
stroke i was a character within a word,stroke i was a character within a word,
elseelse
stroke i must be merged with the next stroke to form a character.stroke i must be merged with the next stroke to form a character.
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
Drawbacks:Drawbacks:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٠
11. The Arabic stroke usually represents more than one. The Arabic stroke usually represents more than one
character which makes it impossible to estimate the Arabiccharacter which makes it impossible to estimate the Arabic
stroke geometry (height, width, etc.).stroke geometry (height, width, etc.).
22. Delayed strokes in English are usually written immediately. Delayed strokes in English are usually written immediately
after the main stroke which is not the case in Arabic strokes.after the main stroke which is not the case in Arabic strokes.
33. The stroke size and stroke sequence varieties among. The stroke size and stroke sequence varieties among
writers make the problem more difficult.writers make the problem more difficult.
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
Our new technique uses the same bottomOur new technique uses the same bottom--up clusteringup clustering
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧١
Our new technique uses the same bottomOur new technique uses the same bottom--up clusteringup clustering
concept and uses the spatiotemporal relations betweenconcept and uses the spatiotemporal relations between
strokes to build the smallest possible FP groups.strokes to build the smallest possible FP groups.
The FP groups contain the main and secondary strokes ofThe FP groups contain the main and secondary strokes of
the same word regardless the sequence by which they werethe same word regardless the sequence by which they were
writtenwritten
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
By examining the states of successive written Arabic strokesBy examining the states of successive written Arabic strokes
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٢
By examining the states of successive written Arabic strokesBy examining the states of successive written Arabic strokes
we found them related spatially to each other by one of thewe found them related spatially to each other by one of the
following relations:following relations:
11. Touching. Touching
∴∴The two strokes should belong to the same word groupThe two strokes should belong to the same word group
22. Not touching but overlapping on x. Not touching but overlapping on x--axisaxis
∴∴ The two strokes should belong to the same wordThe two strokes should belong to the same word
groupgroup
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٣
33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis
If the interIf the inter--stroke distance is less than the average strokestroke distance is less than the average stroke
widthwidth
∴∴ The two strokes should belong to the sameThe two strokes should belong to the same
word groupword group
ElseElse
∴∴ The two strokes should belong to two differentThe two strokes should belong to two different
word groupsword groups
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٤
* Strokes* Strokes 11 && 22: neither touching nor overlapping but belong to the: neither touching nor overlapping but belong to the
same word.same word.
*Strokes*Strokes 22 && 55: neither touching nor overlapping but belong to: neither touching nor overlapping but belong to 22
different words.different words.
* Strokes* Strokes 11 && 33: overlapping and belong to the same word.: overlapping and belong to the same word.
* Strokes* Strokes 77 && 88: touching and belong to the same word.: touching and belong to the same word.
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٥
2. Preprocessing .. (cont’d)
Break down the document into text lines and wordsBreak down the document into text lines and words
or subor sub--words.words.
We overcame these problems:We overcame these problems:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٦
11. Secondaries having null overlapping Y. Secondaries having null overlapping Y--axis projections, thataxis projections, that
were usually separated as an independent text linewere usually separated as an independent text line
22. Base line skew. Base line skew
33. Delayed stroke are comprised in the same word regardless. Delayed stroke are comprised in the same word regardless
the sequence by which they were written.the sequence by which they were written.
2. Preprocessing .. (cont’d)
Detect the type of each stroke (either main orDetect the type of each stroke (either main or
secondary).secondary).
There are many characters having the same main body andThere are many characters having the same main body and
differ only by the dots. By erasing these dots, we can reducediffer only by the dots. By erasing these dots, we can reduce
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٧
differ only by the dots. By erasing these dots, we can reducediffer only by the dots. By erasing these dots, we can reduce
the number of patterns.the number of patterns.
If the FP group containsIf the FP group contains 11 stroke then it should be mainstroke then it should be main--type.type.
If the FP group containsIf the FP group contains 22 or more strokes then the first oneor more strokes then the first one
should be mainshould be main--type. The following strokes may be secondarytype. The following strokes may be secondary
or main depending on its height, shape and location.or main depending on its height, shape and location.
2. Preprocessing .. (cont’d)
Detect the type of each stroke (either main orDetect the type of each stroke (either main or
secondary).secondary).
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٨
3. Pattern Shape Definition
Pattern shapes are defined by observing thePattern shapes are defined by observing the
collected handwritings. We have more than onecollected handwritings. We have more than one
shape for the handwritten character in all its knownshape for the handwritten character in all its known
positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٩
positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).
4. Feature Extraction
Depending on the directions, lengths, and penDepending on the directions, lengths, and pen--
up/down movements of substrokes,up/down movements of substrokes, 2525 substrokessubstrokes
of eight directions are defined: eight long strokesof eight directions are defined: eight long strokes
(A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--upup
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٠
(A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--upup
movements (movements (11––88) and one pen) and one pen--up movement (up movement (99).).
4. Feature Extraction .. (cont’d)
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨١
5. Training
The details of this stage depend greatly on the methodologyThe details of this stage depend greatly on the methodology
that will be used in the recognition stage.that will be used in the recognition stage.
ApproachApproach 11:: Segmentation based systems (Analytical).Segmentation based systems (Analytical).
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٢
ApproachApproach 22:: Segmentation free systems (Holistic).Segmentation free systems (Holistic).
We followed the first approach but by performingWe followed the first approach but by performing
segmentationsegmentation--byby--recognition rather than explicitrecognition rather than explicit
segmentationsegmentation--beforebefore--recognition.recognition.
5. Training .. (cont’d)
S. ElS. El--Dabi [Dabi [33,, 99] used to extract sequentially a set of features] used to extract sequentially a set of features
and accumulating the values while moving along the wordand accumulating the values while moving along the word
image (column by column) then checked against the featureimage (column by column) then checked against the feature
space of a given font until a character is recognized or the endspace of a given font until a character is recognized or the end
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٣
space of a given font until a character is recognized or the endspace of a given font until a character is recognized or the end
of the word is reached.of the word is reached.
We need to build a registry comprising all skeleton patternsWe need to build a registry comprising all skeleton patterns
(feature space) of all pattern shapes.(feature space) of all pattern shapes.
We made transcription files of the training data to describeWe made transcription files of the training data to describe
the content of each training file. These files stand forthe content of each training file. These files stand for
manual segmentation of the word strokesmanual segmentation of the word strokes
5. Training .. (cont’d)
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٤
5. Training .. (cont’d)
For each transcription file,For each transcription file,
pattern shapes data are readpattern shapes data are read
and the direction features areand the direction features are
extracted.extracted.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٥
All the feature vectors belongingAll the feature vectors belonging
to the same pattern shape areto the same pattern shape are
clustered.clustered.
The mostThe most representative patternsrepresentative patterns
(feature vectors) are stored to(feature vectors) are stored to
construct a registry for theconstruct a registry for the
recognition stagerecognition stage
6. Recognition
In this stage, the main task was to find cuts that divide upIn this stage, the main task was to find cuts that divide up
connected components into their individual characters.connected components into their individual characters.
The basic idea is to use a dynamic programming algorithm toThe basic idea is to use a dynamic programming algorithm to
find a globally optimal set of cuts through the input stringfind a globally optimal set of cuts through the input string
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٦
find a globally optimal set of cuts through the input stringfind a globally optimal set of cuts through the input string
(feature vector) which minimizes a certain cost function.(feature vector) which minimizes a certain cost function.
The set of cuts and their precise shape are foundThe set of cuts and their precise shape are found
simultaneously.simultaneously.
The feature vector of the test stroke was compared againstThe feature vector of the test stroke was compared against
the registry (direction after the other) until either a characterthe registry (direction after the other) until either a character
was recognized (i.e., we decide a segmentation point) or thewas recognized (i.e., we decide a segmentation point) or the
feature vector reached its end.feature vector reached its end.
6. Recognition .. (cont’d)
This comparison was performed using a dynamicThis comparison was performed using a dynamic
programming technique called "programming technique called "Minimum Edit DistanceMinimum Edit Distance".".
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٧
6. Recognition .. (cont’d)
Example: assuming Insertion cost = Deletion cost =Example: assuming Insertion cost = Deletion cost = 11,,
substitution cost =substitution cost = 22
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٨
6. Recognition .. (cont’d)
GroupGroup11 = ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group= ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group22 = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h']= ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h']
& Group& Group33 = ['= ['11' '' '22' '' '33' '' '44' '' '55' '' '66' '' '77' '' '88'];'];
The penalties are decided as follows:The penalties are decided as follows:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٩
6. Recognition .. (cont’d)
Insertion Cost = Substitution Cost/Insertion Cost = Substitution Cost/22 &&
Deletion Cost = Substitution Cost/Deletion Cost = Substitution Cost/22
The factors 'The factors '44' and '' and '1616' come from the assumption that short' come from the assumption that short
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٠
The factors 'The factors '44' and '' and '1616' come from the assumption that short' come from the assumption that short
strokes (represented by Groupstrokes (represented by Group 22 directions) are almost halfdirections) are almost half
the length of long strokes (represented by Groupthe length of long strokes (represented by Group 11 directions)directions)
Other value sets for these factors were tried {Other value sets for these factors were tried {11..5522, (, (11..5522))22},},
{{22..5522, (, (22..5522))22}, {}, {3322, (, (3322))22}, {}, {33..5522, (, (33..5522))22}, {}, {4422, (, (4422))22}. We chose}. We chose
{{2222, (, (2222))22} value set as they represent the smallest integer} value set as they represent the smallest integer
values thus the total distances do not get so large.values thus the total distances do not get so large.
6. Recognition .. (cont’d)
The minimumThe minimum--editedit--distance technique is a good mathematicaldistance technique is a good mathematical
measure but cannot be used solely with the chain codemeasure but cannot be used solely with the chain code
feature.feature.
We need either some offWe need either some off--line features or at least templateline features or at least template
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩١
We need either some offWe need either some off--line features or at least templateline features or at least template
matching information.matching information.
We usedWe used string matchingstring matching to find out the number of matchesto find out the number of matches
between the representative patterns from the registry and thebetween the representative patterns from the registry and the
test vector.test vector.
The final cost function is given by the following equation:The final cost function is given by the following equation:
matchesofNumber
patterntiverepresentaofLength
distance-edit-minimumDistance ×=
6. Recognition .. (cont’d)
The probable pattern shapes of the first character in the strokeThe probable pattern shapes of the first character in the stroke
were stored as roots of individual trees.were stored as roots of individual trees.
Each tree was completed by comparing the unEach tree was completed by comparing the un--identifiedidentified
region of the feature vector to the registry again and again toregion of the feature vector to the registry again and again to
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٢
region of the feature vector to the registry again and again toregion of the feature vector to the registry again and again to
find the probable pattern shapes of the second, third andfind the probable pattern shapes of the second, third and
fourth characters till the whole stroke was totally recognized.fourth characters till the whole stroke was totally recognized.
After tree construction, we were able to obtain a ranked list inAfter tree construction, we were able to obtain a ranked list in
which each member comprised the characters (without dots)which each member comprised the characters (without dots)
representing the stroke, ranked with their total edit distancerepresenting the stroke, ranked with their total edit distance
''DistanceDistance‘.‘.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٣
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
FileFile 11:: FileFile 22::
٩٤
6. Recognition .. (cont’d)
The last step left in this stage was the dot restoration.The last step left in this stage was the dot restoration.
Two trials were done for assigning dots to the charactersTwo trials were done for assigning dots to the characters
representing the stroke.representing the stroke.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٥
TrialTrial 11:: The dots centroids were calculated, as well as theThe dots centroids were calculated, as well as the
centroid of each character per stroke and the dots werecentroid of each character per stroke and the dots were
assigned to the character having the nearest centroid.assigned to the character having the nearest centroid.
Despite of the large list size reduction and swapping correctDespite of the large list size reduction and swapping correct
results to the top of the list, the dot position drifts causedresults to the top of the list, the dot position drifts caused
wrong dot assignments to characters and therefore a lot ofwrong dot assignments to characters and therefore a lot of
losses of correct choices as well.losses of correct choices as well.
6. Recognition .. (cont’d)
TrialTrial 22:: Trying different distributions of dots with the strokeTrying different distributions of dots with the stroke
characters and checking the validity of their number andcharacters and checking the validity of their number and
location to remove inconvenient list members.location to remove inconvenient list members.
This trial was more successful, we were able to preserveThis trial was more successful, we were able to preserve
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٦
This trial was more successful, we were able to preserveThis trial was more successful, we were able to preserve
almost all correct list members together with reasonablealmost all correct list members together with reasonable
reduction percentage in the list size.reduction percentage in the list size.
A new ranked list was obtained after removing inconvenientA new ranked list was obtained after removing inconvenient
members.members.
6. Recognition .. (cont’d)
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٧
Results and Discussion
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
TestTraining
44No. of writers
94317No. of words
4351814No. of char.
٩٨
Results representation:Results representation:
Neskovic and Cooper [Neskovic and Cooper [1414], have developed an on], have developed an on--lineline
segmentationsegmentation--byby--recognition system for English using HMMsrecognition system for English using HMMs
together with Dynamic programming technique (Viterbi). Thetogether with Dynamic programming technique (Viterbi). The
output of the system is a ranked set of words. The system'soutput of the system is a ranked set of words. The system's
performance depends on the writer, on his style and the clarityperformance depends on the writer, on his style and the clarity
of his writing: For good writers the correct word is in the topof his writing: For good writers the correct word is in the top 55
words overwords over 9797% of the time. For bad writers the correct word is% of the time. For bad writers the correct word is
in the topin the top 55 words overwords over 9090% of the time.% of the time.
4351814No. of char.
Results and Discussion .. (cont’d)
Using the same terminology in [Using the same terminology in [1414], we can represent our], we can represent our
results as follows:results as follows:
Before dot restoration, the correct segmentationBefore dot restoration, the correct segmentation--
recognition results of the test strokes exist within the toprecognition results of the test strokes exist within the top
list memberslist members 9393%% of the time (of the time (9696%% of the time for the testof the time for the test
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٩
list memberslist members 9393%% of the time (of the time (9696%% of the time for the testof the time for the test
characters).characters).
After dot restoration, the correct segmentationAfter dot restoration, the correct segmentation--recognitionrecognition
results of the test strokes exist within the top list membersresults of the test strokes exist within the top list members
9292%% of the time (of the time (9595%% of the time for the test characters).of the time for the test characters).
Recognition
Probability
Correctly
Recognized
Total Number
.95٤١٥٤٣٥Characters
.92٢٧٩٣٠٥Strokes
.74٧٠٩٤Words
Results and Discussion .. (cont’d)
Fortunately, the most correct recognition results exist at theFortunately, the most correct recognition results exist at the
top of the ranked list.top of the ranked list.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
RecognitionChoices
180
١٠٠
0
20
40
60
80
100
120
140
160
180
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
37
43
53
58
74
116
194
420
521
Locationintherankedlist
No.ofcorrectchoices____
Characters
Strokes
Results and Discussion .. (cont’d)
The list sizes after dot restoration has been reducedThe list sizes after dot restoration has been reduced
significantly with almost no loss for correct results.significantly with almost no loss for correct results.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
١٠١
Results and Discussion .. (cont’d)
TheThe 55% loss in the number of characters recognized is the% loss in the number of characters recognized is the
consequence of two problems:consequence of two problems:
11.. Imperfect segmentationImperfect segmentation: due to not covering a large: due to not covering a large
degree of writing varieties.degree of writing varieties.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
١٠٢
22.. Wrong dot assignmentWrong dot assignment: due to writer drifts and strokes: due to writer drifts and strokes
overlaps.overlaps.
∴∴ Increasing training samples from multi writers andIncreasing training samples from multi writers and
avoiding overlaps is expected to give much better results.avoiding overlaps is expected to give much better results.
١٠٣
The proposed work overviewed both branches ofThe proposed work overviewed both branches of
the handwritten Arabic character recognitionthe handwritten Arabic character recognition
problem: the offproblem: the off--line and the online and the on--line, and attackedline, and attacked
the problem from different sides:the problem from different sides:
11. Isolated and connected character problems. Isolated and connected character problems
22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٤
22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems
33. Single output decision and multi. Single output decision and multi--outputs decisions.outputs decisions.
using the simplest trend of solution: the ruleusing the simplest trend of solution: the rule--
based algorithms.based algorithms.
We proposed an offWe proposed an off--line character recognitionline character recognition
system for isolated handwritten Arabic charactersystem for isolated handwritten Arabic character
recognition,recognition,
And we were able to achieve high results,And we were able to achieve high results,
comparable to that achieved by other researcherscomparable to that achieved by other researchers
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٥
comparable to that achieved by other researcherscomparable to that achieved by other researchers
by proposing the idea of multiple classifier systemby proposing the idea of multiple classifier system
besides using a classification hierarchy based onbesides using a classification hierarchy based on
the structural features of Arabic characters andthe structural features of Arabic characters and
using feature fusion.using feature fusion.
We proposed a ruleWe proposed a rule--based algorithm for the twobased algorithm for the two
early stages of an onearly stages of an on--line cursive Arabicline cursive Arabic
handwriting recognizer.handwriting recognizer.
We followed a segmentationWe followed a segmentation--byby--recognitionrecognition
approach, and we used the pen trajectory as theapproach, and we used the pen trajectory as the
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٦
approach, and we used the pen trajectory as theapproach, and we used the pen trajectory as the
feature with some modifications. We were able tofeature with some modifications. We were able to
correctly segment and recognize most of the testcorrectly segment and recognize most of the test
words.words.
Following the pen trajectory causes the loss ofFollowing the pen trajectory causes the loss of
the global pattern shape information which thethe global pattern shape information which the
offoff--line image provides (e.g., confusions betweenline image provides (e.g., confusions between
{{‫و‬‫و‬,, ‫ر‬‫ر‬} and {} and {‫ـھـ‬‫,ـھـ‬, ‫ـمفـ‬‫.)}ـمفـ‬}).
On the other hand, converting the onOn the other hand, converting the on--line data toline data to
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٧
On the other hand, converting the onOn the other hand, converting the on--line data toline data to
bitmaps and trying to solve it as offbitmaps and trying to solve it as off--line is a veryline is a very
hard task, still under research and is not yethard task, still under research and is not yet
achieving reliable results [achieving reliable results [4040,, 4141,, 4242].].
Besides, the segmentation task is quite harder inBesides, the segmentation task is quite harder in
case of offcase of off--line than online than on--line.line.
As future work that we can benefit both systems'As future work that we can benefit both systems'
advantages by using onadvantages by using on--line/offline/off--line classifierline classifier
ensemble system. In this case:ensemble system. In this case:
The segmentation decision in offThe segmentation decision in off--line case is correctedline case is corrected
using the onusing the on--line system andline system and
The classification decision of the onThe classification decision of the on--line case isline case is
corrected using the offcorrected using the off--line system.line system.
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٨
corrected using the offcorrected using the off--line system.line system.
As future work we need:As future work we need:
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٩
1.1. Much larger neat training database from cooperative writersMuch larger neat training database from cooperative writers
to enhance the results and automating the transcription fileto enhance the results and automating the transcription file
creation process.creation process.
2.2. Introducing Language Model and solving ambiguitiesIntroducing Language Model and solving ambiguities
linguistically to obtain single output.linguistically to obtain single output.
3.3. Encountering a large degree of writing variability by using aEncountering a large degree of writing variability by using a
multiple classifier system and decision fusion.multiple classifier system and decision fusion.
4.4. Working on numerals and Reqaa font are still considered asWorking on numerals and Reqaa font are still considered as
open issues and need more research.open issues and need more research.
١١٠

Rule based algorithm for handwritten characters recognition

  • 1.
    Presentation OrganizationPresentation Organization 1.Introduction 2. Document Analysis and Character Recognition 3. Objective ١ 3. Objective 4. Rule-based Algorithm for Off-line Isolated Handwritten character recognition 5. Rule-based Algorithm for On-line Cursive Handwriting Segmentation and Recognition 6. Summary, Conclusion and Future work
  • 2.
    ٢ Prepared by:Prepared by: Eng.Randa Ibrahim M. ElanwarEng. Randa Ibrahim M. Elanwar Research assistant , Electronic Research Institute Under the supervision of:Under the supervision of: Prof. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. MashalyProf. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. Mashaly Professor of Digital Signal Processing, Head of computers and systems dept, Faculty of Engineering, Cairo University Electronic Research Institute
  • 3.
    Presentation OrganizationPresentation Organization 1.Introduction 2. Document Analysis and Character Recognition 3. Objective ٣ 3. Objective 4. Rule-based Algorithm for Off-line Isolated Handwritten character recognition 5. Rule-based Algorithm for On-line Cursive Handwriting Segmentation and Recognition 6. Summary, Conclusion and Future work
  • 4.
  • 5.
    IntroductionIntroduction The Motivation ofDocumentThe Motivation of Document Analysis and Recognition (DAR) &Analysis and Recognition (DAR) & Character Recognition (CR)Character Recognition (CR) ٥ Character Recognition (CR)Character Recognition (CR) research fieldsresearch fields Arabic Character RecognitionArabic Character Recognition
  • 6.
    IntroductionIntroduction Motivation of DocumentAnalysis and Character Recognition Facilities of using documents in computerized formatFacilities of using documents in computerized format 11. Easy editing. Easy editing ٦ 11. Easy editing. Easy editing 22. High quality hard copies. High quality hard copies 33. Quick distribution across world. Quick distribution across world--wide networkswide networks 44. Key word or pattern searching. Key word or pattern searching
  • 7.
    IntroductionIntroduction Motivation of DocumentAnalysis and Character Recognition .. (cont’d) Trillions of old documents, handwritten notes,Trillions of old documents, handwritten notes, forms or drawings, that are still not informs or drawings, that are still not in ٧ forms or drawings, that are still not informs or drawings, that are still not in computerized format.computerized format. The manual process used to enter the dataThe manual process used to enter the data from these documents into computers demandsfrom these documents into computers demands a great deal of time and money.a great deal of time and money.
  • 8.
    IntroductionIntroduction Motivation of DocumentAnalysis and Character Recognition .. (cont’d) The general objective of DAR research is toThe general objective of DAR research is to fully automate the process of understandingfully automate the process of understanding ٨ fully automate the process of understandingfully automate the process of understanding printed or handwritten data and entering it toprinted or handwritten data and entering it to the computer.the computer. The Optical Character Recognition (OCR) isThe Optical Character Recognition (OCR) is the subthe sub--field of document analysis concernedfield of document analysis concerned with the recognition of machine printed orwith the recognition of machine printed or handwritten characters in a document.handwritten characters in a document.
  • 9.
    IntroductionIntroduction Motivation of DocumentAnalysis and Character Recognition .. (cont’d) With the advent of a Personal Digital AssistantWith the advent of a Personal Digital Assistant (PDA) there is a great need for handwriting(PDA) there is a great need for handwriting ٩ (PDA) there is a great need for handwriting(PDA) there is a great need for handwriting recognition.recognition. The problem of recognizing writing in case ofThe problem of recognizing writing in case of handwritten scanned document images ishandwritten scanned document images is referred to as offreferred to as off--line handwriting recognition.line handwriting recognition. The problem of recognizing writing in case ofThe problem of recognizing writing in case of PDAs is referred to as onPDAs is referred to as on--line handwritingline handwriting recognition.recognition.
  • 10.
    IntroductionIntroduction Arabic Character Recognition Specialcharacteristics of Arabic scripts:Special characteristics of Arabic scripts: Always written from right to left.Always written from right to left. ١٠ Arabic word consists of one or more portions;Arabic word consists of one or more portions; each has one or more characters.each has one or more characters. Many characters differ only by the position andMany characters differ only by the position and the number of dots attached.the number of dots attached.
  • 11.
    IntroductionIntroduction Arabic Character Recognition.. (cont’d) Special characteristics of Arabic scripts:Special characteristics of Arabic scripts: Every character has more than one shape,Every character has more than one shape, depending on its position.depending on its position. ١١ depending on its position.depending on its position. Characters overlap.Characters overlap.
  • 12.
    IntroductionIntroduction Arabic Character Recognition.. (cont’d) Special characteristics of Arabic scripts:Special characteristics of Arabic scripts: Existence of Ligatures.Existence of Ligatures. ١٢ As a result of encountering these specialAs a result of encountering these special characteristics, Arabic character recognitioncharacteristics, Arabic character recognition systems still need more research to besystems still need more research to be established commercially.established commercially.
  • 13.
  • 14.
    Document Analysis andDocumentAnalysis and Character RecognitionCharacter Recognition OffOff--line Document Analysis & CRline Document Analysis & CR PreprocessingPreprocessing FeaturesFeatures OnOn--line Document Analysis & CRline Document Analysis & CR ١٤ OnOn--line Document Analysis & CRline Document Analysis & CR PreprocessingPreprocessing FeaturesFeatures SegmentationSegmentation Learning and ClassificationLearning and Classification
  • 15.
    The DACR fieldis subdivided to:The DACR field is subdivided to: 11. Off. Off--line Document Analysis & CRline Document Analysis & CR ApplicationsApplications: Bank check processing, Mail sorting,: Bank check processing, Mail sorting, Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ١٥ ApplicationsApplications: Bank check processing, Mail sorting,: Bank check processing, Mail sorting, Reading of commercial forms, etcReading of commercial forms, etc 22. On. On--line Document Analysis & CRline Document Analysis & CR ApplicationsApplications: Pen computing industry, Signature verification,: Pen computing industry, Signature verification, Author authenticationAuthor authentication
  • 16.
    1. Off-line DocumentAnalysis &CR Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ١٦
  • 17.
    1. Off-line DocumentAnalysis & CR .. (cont’d) 1.1 Preprocessing BinarizationBinarization Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ١٧ BinarizationBinarization Noise removalNoise removal NormalizationNormalization Morphological image processing: Opening, Closing,Morphological image processing: Opening, Closing, Erosion, Dilation, etc.Erosion, Dilation, etc. Segmentation: Explicit, Implicit, segmentationSegmentation: Explicit, Implicit, segmentation--freefree
  • 18.
    1. Off-line DocumentAnalysis & CR .. (cont’d) 1.2 Features Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ١٨ Structural DecompositionStructural Decomposition (Height contour and chain code features, End points, T-joints and X-joints) Series ExpansionSeries Expansion (Moments, Fourier Transform, Gabor Transform and Wavelets)
  • 19.
    2. On-line DocumentAnalysis & CR Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ١٩
  • 20.
    2. On-line DocumentAnalysis & CR .. (cont’d) 2.1 Preprocessing Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٠ Noise removalNoise removal (Smoothing, Filtering, De(Smoothing, Filtering, De--hooking, etc)hooking, etc) NormalizationNormalization (Slant correction, Baseline drift correction, Scale normalization, etc)(Slant correction, Baseline drift correction, Scale normalization, etc) SegmentationSegmentation (Explicit, Implicit, Segmentation(Explicit, Implicit, Segmentation--free)free)
  • 21.
    2. On-line DocumentAnalysis & CR .. (cont’d) 2.2 Features Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢١ Features are typically extracted at a subFeatures are typically extracted at a sub--letter level:letter level: Shape DescriptorsShape Descriptors (Ascender, descender, concavity, loop, cusp, curliness, lineness) Tangent and curvature features for a window ofTangent and curvature features for a window of pointspoints Writing SpeedWriting Speed
  • 22.
    Segmentation Segmentation based oncontour analysis andSegmentation based on contour analysis and baseline locationbaseline location Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٢ Segmentation based on vertical histogramSegmentation based on vertical histogram Stroke SegmentationStroke Segmentation PostPost-- Segmentation (Segmentation by recognition)Segmentation (Segmentation by recognition) Segmentation by Neural NetworkSegmentation by Neural Network Segmentation using Dynamic programming (PreSegmentation using Dynamic programming (Pre-- stroke segmentation)stroke segmentation)
  • 23.
    Segmentation .. (cont’d) Segmentationbased on contour analysis andSegmentation based on contour analysis and baseline locationbaseline location Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٣ The chain code providesThe chain code provides information about findinginformation about finding the baseline location.the baseline location. After defining the baselineAfter defining the baseline location, segmentation islocation, segmentation is done at the points wheredone at the points where contour makes transitioncontour makes transition from the inside to thefrom the inside to the outside of the baseline.outside of the baseline.
  • 24.
    Segmentation .. (cont’d) StrokeSegmentationStroke Segmentation Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٤
  • 25.
    Segmentation .. (cont’d) Segmentationbased on vertical histogramSegmentation based on vertical histogram Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٥ After plotting the vertical histogram of the word or subAfter plotting the vertical histogram of the word or sub-- word, it is traversed by a predefined threshold.word, it is traversed by a predefined threshold. The zones above this threshold are isolated.The zones above this threshold are isolated. This threshold value depends on the font, and isThis threshold value depends on the font, and is proportional to the lump of black pixels that joinsproportional to the lump of black pixels that joins characters togethercharacters together
  • 26.
    Segmentation .. (cont’d) PostPost--Segmentation (Segmentation (Segmentation by recognitionSegmentation by recognition)) The basic idea is to extract sequentially a set ofThe basic idea is to extract sequentially a set of Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٦ The basic idea is to extract sequentially a set ofThe basic idea is to extract sequentially a set of features and accumulating the values while movingfeatures and accumulating the values while moving along the word. then checked against the featurealong the word. then checked against the feature space of a given font.space of a given font. This process is repeated until the character isThis process is repeated until the character is recognized or the end of the word is reached.recognized or the end of the word is reached.
  • 27.
    Segmentation .. (cont’d) Segmentationby Neural NetworkSegmentation by Neural Network Neural Networks are trained on manually markedNeural Networks are trained on manually marked Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٧ Neural Networks are trained on manually markedNeural Networks are trained on manually marked break points.break points. For the test words, Neural Networks will have toFor the test words, Neural Networks will have to determine the location of break points betweendetermine the location of break points between characters.characters.
  • 28.
    Segmentation .. (cont’d) Segmentationusing Dynamic programmingSegmentation using Dynamic programming (Pre(Pre--stroke segmentation)stroke segmentation) Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٨ Valley points (festoonValley points (festoon--like strokes) usually correspond tolike strokes) usually correspond to segmentation points between characters.segmentation points between characters. The basic idea is to use a dynamic programmingThe basic idea is to use a dynamic programming algorithm to find a globally optimal set of cuts throughalgorithm to find a globally optimal set of cuts through the input string which minimizes a certain cost function.the input string which minimizes a certain cost function. The set of cuts and their precise shape are foundThe set of cuts and their precise shape are found simultaneously.simultaneously.
  • 29.
    Learning (Training) Supervised LearningSupervisedLearning Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٢٩ Unsupervised LearningUnsupervised Learning Reinforcement LearningReinforcement Learning
  • 30.
    Learning (Training) ..(cont’d) Supervised LearningSupervised Learning Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٣٠ A teacher provides a category label or cost for eachA teacher provides a category label or cost for each pattern in a training setpattern in a training set Unsupervised LearningUnsupervised Learning There is no explicit teacher, and the system formsThere is no explicit teacher, and the system forms clusters or “natural groupings” of the input patterns.clusters or “natural groupings” of the input patterns.
  • 31.
    Learning (Training) ..(cont’d) Reinforcement LearningReinforcement Learning Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٣١ This is analogous to a critic who merely states thatThis is analogous to a critic who merely states that something is right or wrong, but does not saysomething is right or wrong, but does not say specifically how it is wrong.specifically how it is wrong. (Thus only binary feedback is given to the classifier)(Thus only binary feedback is given to the classifier)
  • 32.
    Classification (Recognition) Classification ApproachesClassificationApproaches 11. Holistic Approach. Holistic Approach Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٣٢ 11. Holistic Approach. Holistic Approach Segmentation free, Closed Vocabulary, Global featuresSegmentation free, Closed Vocabulary, Global features 22. Analytical Approach. Analytical Approach Implicit or Explicit Segmentation, Open VocabularyImplicit or Explicit Segmentation, Open Vocabulary
  • 33.
    Classification (Recognition) ..(cont’d) Classification ToolsClassification Tools 11. Template Matching. Template Matching Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٣٣ 11. Template Matching. Template Matching (Direct matching, string matching and elastic matching)(Direct matching, string matching and elastic matching) 22. Statistical Methods. Statistical Methods (k nearest neighbour, Bayesian Classifier)(k nearest neighbour, Bayesian Classifier) 33. Stochastic Processes. Stochastic Processes (Markov Chain)(Markov Chain)
  • 34.
    Classification (Recognition) ..(cont’d) Classification ToolsClassification Tools 44. Structural Matching. Structural Matching Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٣٤ 44. Structural Matching. Structural Matching (Trees, Chains, etc)(Trees, Chains, etc) 55. Neural Networks. Neural Networks 66. Rule. Rule--based Methodsbased Methods (Abstract description of writing)(Abstract description of writing) 77. Multiple Classifiers. Multiple Classifiers (Classifier Ensemble)(Classifier Ensemble)
  • 35.
    OnOn--line and Offlinecharacter recognitionline and Offline character recognition systems can be categorized as:systems can be categorized as: 11. Recognition of Isolated Characters (. Recognition of Isolated Characters (ISRISR).). Document Analysis andDocument Analysis and Character RecognitionCharacter Recognition ٣٥ 22. Explicit Segmentation into characters/primitives Before. Explicit Segmentation into characters/primitives Before Recognition (Recognition (SBRSBR).). 33. Simultaneous / Sequential recognition and segmentation. Simultaneous / Sequential recognition and segmentation ((SSRSSR).). 44. Global Whole Word recognition (. Global Whole Word recognition (GWRGWR).).
  • 36.
  • 37.
    ObjectiveObjective 11. Viewing theACR problem from different sides:. Viewing the ACR problem from different sides: Isolated and cursiveIsolated and cursive OffOff--line and online and on--line character problemline character problem Single writer and multiSingle writer and multi--writer variabilitywriter variability (WD & WI)(WD & WI) ٣٧ (WD & WI)(WD & WI) 22. Achieving the best possible character. Achieving the best possible character recognition accuracy using the most logicalrecognition accuracy using the most logical rulerule--based algorithmsbased algorithms
  • 38.
  • 39.
    A. System Stages 11.Database Collection. Database Collection 22. Preprocessing. Preprocessing RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٣٩ 33. Feature Extraction, Learning & Classification. Feature Extraction, Learning & Classification 33..11) A single feature) A single feature--based classifierbased classifier systemsystem 33..22) Hierarchical Mixture of feature) Hierarchical Mixture of feature--basedbased classifiers systemclassifiers system B. Results and Discussion
  • 40.
    1. Database Collection: Adatabase for a single writer consisted ofA database for a single writer consisted of 3030 samples (samples (2020 for training andfor training and 1010 for test) of thefor test) of the Arabic alphabetic characters were used. i.e.Arabic alphabetic characters were used. i.e. 580580 characters for training andcharacters for training and 290290 for testfor test RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٠ characters for training andcharacters for training and 290290 for testfor test 2. Preprocessing: Character Image BinarizationCharacter Image Binarization Character Image ThresholdingCharacter Image Thresholding
  • 41.
    3. Feature Extraction,Learning and Classification: Recognition results were based upon theRecognition results were based upon the comparison between:comparison between: RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤١ 11. A single feature. A single feature--based classifier systembased classifier system 22. Hierarchical Mixture of feature. Hierarchical Mixture of feature--based classifiersbased classifiers systemsystem 33..11) A single feature) A single feature--based classifier systembased classifier system The feature used for this single classifier systemThe feature used for this single classifier system was mainly the radial distanceswas mainly the radial distances
  • 42.
    3.1) A singlefeature-based classifier system: RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٢ In the training stage, we compute a representative pattern forIn the training stage, we compute a representative pattern for each classeach class Each character was considered a separate classEach character was considered a separate class Classification using the Euclidean distance measureClassification using the Euclidean distance measure
  • 43.
    3.1) A singlefeature-based classifier system: .. (cont’d) The average system accuracy =The average system accuracy = 7070..0606%% Most of the confusions lack sense. This isMost of the confusions lack sense. This is RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٣ Most of the confusions lack sense. This isMost of the confusions lack sense. This is because:because: The input pattern is compared to all classes.The input pattern is compared to all classes. One feature is not representative enough.One feature is not representative enough. We need a better way of categorizationWe need a better way of categorization We need to Acquire more featuresWe need to Acquire more features
  • 44.
    RuleRule--based Algorithm forOffbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition Character images are composed ofCharacter images are composed of 11,, 22,, 33 oror 44 objectsobjects Example:Example: ٤٤ We have a main object (character body) and secondaries.We have a main object (character body) and secondaries. To determine the number of dots associated we need toTo determine the number of dots associated we need to discriminate between:discriminate between: 1.1. Single dotSingle dot 2.2. Two stuck dotsTwo stuck dots 3.3. HamzaHamza 4.4. Separated AlefSeparated Alef
  • 45.
    3.2) Hierarchical Mixtureof feature- based classifiers system The recognition stage in our proposed system had passed byThe recognition stage in our proposed system had passed by 44 stages:stages: StageStage 11:: using classifier ensemble (hierarchical mixture ofusing classifier ensemble (hierarchical mixture of RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٥ StageStage 11:: using classifier ensemble (hierarchical mixture ofusing classifier ensemble (hierarchical mixture of experts) gated by using dotsexperts) gated by using dots StageStage 22:: Adding more structural features for gatingAdding more structural features for gating between different featurebetween different feature--based classifiersbased classifiers StageStage 33:: Adding more features and using feature fusionAdding more features and using feature fusion StageStage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
  • 46.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 11:: using classifier ensemble (hierarchicalusing classifier ensemble (hierarchical mixture of experts) gated by using dotsmixture of experts) gated by using dots RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٦ Characters are clustered into groups according to theCharacters are clustered into groups according to the number of dots attached to them to work as gatingnumber of dots attached to them to work as gating between redundant classifiers.between redundant classifiers. The same feature is used for recognition in eachThe same feature is used for recognition in each cluster. i.e., we now have acluster. i.e., we now have a classifier ensemble ofclassifier ensemble of individual classifiers (individual classifiers (by varying training databy varying training data).). Classification using the Euclidean distance measureClassification using the Euclidean distance measure
  • 47.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 11:: using classifier ensemble (hierarchicalusing classifier ensemble (hierarchical mixture of experts) gated by using dotsmixture of experts) gated by using dots RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٧ The average system accuracy =The average system accuracy = 7878..3333%%
  • 48.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 22:: Adding more structural features for gatingAdding more structural features for gating between different featurebetween different feature--based classifiersbased classifiers RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٨ Characters are clustered into groups according to theCharacters are clustered into groups according to the number of dots attached to them and the existence ofnumber of dots attached to them and the existence of loops and Hamzas: (loops and Hamzas: (88 different classifiers).different classifiers). The same feature is used for recognition in eachThe same feature is used for recognition in each cluster.cluster. Classification using the Euclidean distance measureClassification using the Euclidean distance measure The average system accuracy has risen to beThe average system accuracy has risen to be 8080..8686%%
  • 49.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 22:: Adding more structural features for gatingAdding more structural features for gating between different featurebetween different feature--based classifiersbased classifiers RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٤٩ New Structural features are added:New Structural features are added: Number and position of the character stroke endNumber and position of the character stroke end pointspoints Number of vertical and horizontal lines cuts by theNumber of vertical and horizontal lines cuts by the character bodycharacter body
  • 50.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 22:: Adding more structural features for gatingAdding more structural features for gating between different featurebetween different feature--based classifiersbased classifiers RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٠ The average system accuracy =The average system accuracy = 9292..2525%%
  • 51.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 22:: Adding more structural features for gatingAdding more structural features for gating between different featurebetween different feature--based classifiersbased classifiers RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥١
  • 52.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 33:: Adding more features and using featureAdding more features and using feature fusionfusion RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٢ A New featureA New feature--based classifier that usesbased classifier that uses 4545°° inclined linesinclined lines cuts feature is addedcuts feature is added We used a fusion technique,We used a fusion technique, weighted averageweighted average, to, to combine together different featurescombine together different features The average system accuracy has risen to beThe average system accuracy has risen to be 9696%%
  • 53.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 33:: Adding more features and using featureAdding more features and using feature fusionfusion RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٣
  • 54.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 44:: Increasing the reliability of gatingIncreasing the reliability of gating We raised the secondaries identification accuracy toWe raised the secondaries identification accuracy to 9999..77%% using some structural features:using some structural features: RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٤ using some structural features:using some structural features: Character Body to Secondary Ratio,Character Body to Secondary Ratio, Secondary Black to white pixel ratio, andSecondary Black to white pixel ratio, and Secondary height to width ratio.Secondary height to width ratio. We removed class overlapping in the feature spaceWe removed class overlapping in the feature space The average system accuracy has risen to beThe average system accuracy has risen to be 9797%%
  • 55.
    3.2) Hierarchical Mixtureof feature- based classifiers system .. (cont’d) StageStage 44:: Increasing the reliability of gatingIncreasing the reliability of gating RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٥
  • 56.
    Results and Discussion Thesystem stages followed to end up with:The system stages followed to end up with: 11. Average recognition accuracy of. Average recognition accuracy of 9797%% 22. The total increase in the recognition accuracy is about. The total increase in the recognition accuracy is about RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٦ 22. The total increase in the recognition accuracy is about. The total increase in the recognition accuracy is about 2727% from the recognition accuracy achieved by a single% from the recognition accuracy achieved by a single classifier systemclassifier system 33. We were able to achieve high results using the most. We were able to achieve high results using the most common features by proposing the idea of multiplecommon features by proposing the idea of multiple classifier system (classifier ensemble) besides using aclassifier system (classifier ensemble) besides using a classification hierarchy based on the structural features ofclassification hierarchy based on the structural features of Arabic characters.Arabic characters.
  • 57.
    Results and Discussion Oursystem is very simple and the results areOur system is very simple and the results are comparable to those obtained by other researchers:comparable to those obtained by other researchers: RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition ٥٧
  • 58.
    Results and Discussion RuleRule--basedAlgorithm for Offbased Algorithm for Off--line Isolatedline Isolated Handwritten character recognitionHandwritten character recognition 70.06 78.33 92.25 96 97 40 60 80 100 AverageAccuracy__ ٥٨ 0 20 AverageAccuracy__ Single Classifier Stage 1 Stage 2 Stage 3 Stage 4
  • 59.
  • 60.
    Classically [Classically [1111],on], on--line recognizers consist of:line recognizers consist of: 11. A preprocessor. A preprocessor 22. A classifier which provides estimates of. A classifier which provides estimates of probabilities for the different categories ofprobabilities for the different categories of RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٠ probabilities for the different categories ofprobabilities for the different categories of characters andcharacters and 33. A postprocessor, which eventually incorporates. A postprocessor, which eventually incorporates a language modela language model We propose a ruleWe propose a rule--based algorithm for the two earlybased algorithm for the two early stages of an onstages of an on--line recognizer cursive Arabicline recognizer cursive Arabic handwritinghandwriting
  • 61.
    A. System Stages 11.Database Collection. Database Collection 22. Preprocessing. Preprocessing RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦١ 33. Pattern Shapes Definition. Pattern Shapes Definition 44. Feature Extraction. Feature Extraction 55. Training. Training 66. Recognition. Recognition B. Results and Discussion
  • 62.
    1. Database Collection Handwrittendocuments were collected on aHandwritten documents were collected on a slate tablet PCslate tablet PC The Database collected was unconstrainedThe Database collected was unconstrained RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٢ The Database collected was unconstrainedThe Database collected was unconstrained (open vocabulary)(open vocabulary) No digits included.No digits included. Writing is in NASKH font onlyWriting is in NASKH font only
  • 63.
    2. Preprocessing Filter thedocument and clear it from unintendedFilter the document and clear it from unintended writers' errors.writers' errors. Break down the document into text lines andBreak down the document into text lines and RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٣ Break down the document into text lines andBreak down the document into text lines and words or subwords or sub--words.words. Detect the type of each stroke (either mainDetect the type of each stroke (either main--bodybody or secondary).or secondary).
  • 64.
    2. Preprocessing ..(cont’d) Filter the document and clear it from unintendedFilter the document and clear it from unintended writers' errors.writers' errors. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٤
  • 65.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. The two problems that face using xThe two problems that face using x--y axes projectiony axes projection RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٥ The two problems that face using xThe two problems that face using x--y axes projectiony axes projection histograms:histograms: 11. The base line skewing that makes line separation difficult. The base line skewing that makes line separation difficult and needs careful skew detection and correction stage.and needs careful skew detection and correction stage. 22. The multi. The multi--word overlap where the interword overlap where the inter--word distance isword distance is smaller than the normal expected threshold for separatingsmaller than the normal expected threshold for separating words.words.
  • 66.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. E. Ratzlaff used a “bottomE. Ratzlaff used a “bottom--up” clustering of discrete strokesup” clustering of discrete strokes into increasingly larger groups that eventually merge tointo increasingly larger groups that eventually merge to RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٦ into increasingly larger groups that eventually merge tointo increasingly larger groups that eventually merge to complete text lines.complete text lines. The initial bottomThe initial bottom--up clustering began by creating Forwardup clustering began by creating Forward Projection (FP) groups.Projection (FP) groups. Strokes were merged into FP groups if they have stronglyStrokes were merged into FP groups if they have strongly overlapping Yoverlapping Y--axis projections. A single unmerged strokeaxis projections. A single unmerged stroke became an independent FPbecame an independent FP
  • 67.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. Drawbacks:Drawbacks: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٧ 11. The secondaries usually have null overlapping Y. The secondaries usually have null overlapping Y--axisaxis projectionsprojections 22. Large base line skews among the text line and even within. Large base line skews among the text line and even within one word.one word.
  • 68.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. Another idea for text line separation was expressed by GarethAnother idea for text line separation was expressed by Gareth Loudon et al. This was successfully working with EnglishLoudon et al. This was successfully working with English RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٨ Loudon et al. This was successfully working with EnglishLoudon et al. This was successfully working with English script due to limited cursive nature, i.e. the stroke (penscript due to limited cursive nature, i.e. the stroke (pen down/up movement) usually represents a single character.down/up movement) usually represents a single character. Several parameters were calculated for each stroke during theSeveral parameters were calculated for each stroke during the character segmentation step.character segmentation step.
  • 69.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. Example:Example: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٦٩ if (si > max(xi)) or (if (si > max(xi)) or (--si >si > 22* max(xi) & yi > max(xi)),* max(xi) & yi > max(xi)), then stroke i was a character at the end of a word,then stroke i was a character at the end of a word, else if ( ci >else if ( ci > 00)) stroke i was a character within a word,stroke i was a character within a word, elseelse stroke i must be merged with the next stroke to form a character.stroke i must be merged with the next stroke to form a character.
  • 70.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. Drawbacks:Drawbacks: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٠ 11. The Arabic stroke usually represents more than one. The Arabic stroke usually represents more than one character which makes it impossible to estimate the Arabiccharacter which makes it impossible to estimate the Arabic stroke geometry (height, width, etc.).stroke geometry (height, width, etc.). 22. Delayed strokes in English are usually written immediately. Delayed strokes in English are usually written immediately after the main stroke which is not the case in Arabic strokes.after the main stroke which is not the case in Arabic strokes. 33. The stroke size and stroke sequence varieties among. The stroke size and stroke sequence varieties among writers make the problem more difficult.writers make the problem more difficult.
  • 71.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. Our new technique uses the same bottomOur new technique uses the same bottom--up clusteringup clustering RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧١ Our new technique uses the same bottomOur new technique uses the same bottom--up clusteringup clustering concept and uses the spatiotemporal relations betweenconcept and uses the spatiotemporal relations between strokes to build the smallest possible FP groups.strokes to build the smallest possible FP groups. The FP groups contain the main and secondary strokes ofThe FP groups contain the main and secondary strokes of the same word regardless the sequence by which they werethe same word regardless the sequence by which they were writtenwritten
  • 72.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. By examining the states of successive written Arabic strokesBy examining the states of successive written Arabic strokes RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٢ By examining the states of successive written Arabic strokesBy examining the states of successive written Arabic strokes we found them related spatially to each other by one of thewe found them related spatially to each other by one of the following relations:following relations: 11. Touching. Touching ∴∴The two strokes should belong to the same word groupThe two strokes should belong to the same word group 22. Not touching but overlapping on x. Not touching but overlapping on x--axisaxis ∴∴ The two strokes should belong to the same wordThe two strokes should belong to the same word groupgroup
  • 73.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. 33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٣ 33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis If the interIf the inter--stroke distance is less than the average strokestroke distance is less than the average stroke widthwidth ∴∴ The two strokes should belong to the sameThe two strokes should belong to the same word groupword group ElseElse ∴∴ The two strokes should belong to two differentThe two strokes should belong to two different word groupsword groups
  • 74.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. Example:Example: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٤ * Strokes* Strokes 11 && 22: neither touching nor overlapping but belong to the: neither touching nor overlapping but belong to the same word.same word. *Strokes*Strokes 22 && 55: neither touching nor overlapping but belong to: neither touching nor overlapping but belong to 22 different words.different words. * Strokes* Strokes 11 && 33: overlapping and belong to the same word.: overlapping and belong to the same word. * Strokes* Strokes 77 && 88: touching and belong to the same word.: touching and belong to the same word.
  • 75.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٥
  • 76.
    2. Preprocessing ..(cont’d) Break down the document into text lines and wordsBreak down the document into text lines and words or subor sub--words.words. We overcame these problems:We overcame these problems: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٦ 11. Secondaries having null overlapping Y. Secondaries having null overlapping Y--axis projections, thataxis projections, that were usually separated as an independent text linewere usually separated as an independent text line 22. Base line skew. Base line skew 33. Delayed stroke are comprised in the same word regardless. Delayed stroke are comprised in the same word regardless the sequence by which they were written.the sequence by which they were written.
  • 77.
    2. Preprocessing ..(cont’d) Detect the type of each stroke (either main orDetect the type of each stroke (either main or secondary).secondary). There are many characters having the same main body andThere are many characters having the same main body and differ only by the dots. By erasing these dots, we can reducediffer only by the dots. By erasing these dots, we can reduce RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٧ differ only by the dots. By erasing these dots, we can reducediffer only by the dots. By erasing these dots, we can reduce the number of patterns.the number of patterns. If the FP group containsIf the FP group contains 11 stroke then it should be mainstroke then it should be main--type.type. If the FP group containsIf the FP group contains 22 or more strokes then the first oneor more strokes then the first one should be mainshould be main--type. The following strokes may be secondarytype. The following strokes may be secondary or main depending on its height, shape and location.or main depending on its height, shape and location.
  • 78.
    2. Preprocessing ..(cont’d) Detect the type of each stroke (either main orDetect the type of each stroke (either main or secondary).secondary). RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٨
  • 79.
    3. Pattern ShapeDefinition Pattern shapes are defined by observing thePattern shapes are defined by observing the collected handwritings. We have more than onecollected handwritings. We have more than one shape for the handwritten character in all its knownshape for the handwritten character in all its known positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated). RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٧٩ positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).
  • 80.
    4. Feature Extraction Dependingon the directions, lengths, and penDepending on the directions, lengths, and pen-- up/down movements of substrokes,up/down movements of substrokes, 2525 substrokessubstrokes of eight directions are defined: eight long strokesof eight directions are defined: eight long strokes (A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--upup RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٠ (A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--upup movements (movements (11––88) and one pen) and one pen--up movement (up movement (99).).
  • 81.
    4. Feature Extraction.. (cont’d) Example:Example: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨١
  • 82.
    5. Training The detailsof this stage depend greatly on the methodologyThe details of this stage depend greatly on the methodology that will be used in the recognition stage.that will be used in the recognition stage. ApproachApproach 11:: Segmentation based systems (Analytical).Segmentation based systems (Analytical). RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٢ ApproachApproach 22:: Segmentation free systems (Holistic).Segmentation free systems (Holistic). We followed the first approach but by performingWe followed the first approach but by performing segmentationsegmentation--byby--recognition rather than explicitrecognition rather than explicit segmentationsegmentation--beforebefore--recognition.recognition.
  • 83.
    5. Training ..(cont’d) S. ElS. El--Dabi [Dabi [33,, 99] used to extract sequentially a set of features] used to extract sequentially a set of features and accumulating the values while moving along the wordand accumulating the values while moving along the word image (column by column) then checked against the featureimage (column by column) then checked against the feature space of a given font until a character is recognized or the endspace of a given font until a character is recognized or the end RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٣ space of a given font until a character is recognized or the endspace of a given font until a character is recognized or the end of the word is reached.of the word is reached. We need to build a registry comprising all skeleton patternsWe need to build a registry comprising all skeleton patterns (feature space) of all pattern shapes.(feature space) of all pattern shapes. We made transcription files of the training data to describeWe made transcription files of the training data to describe the content of each training file. These files stand forthe content of each training file. These files stand for manual segmentation of the word strokesmanual segmentation of the word strokes
  • 84.
    5. Training ..(cont’d) Example:Example: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٤
  • 85.
    5. Training ..(cont’d) For each transcription file,For each transcription file, pattern shapes data are readpattern shapes data are read and the direction features areand the direction features are extracted.extracted. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٥ All the feature vectors belongingAll the feature vectors belonging to the same pattern shape areto the same pattern shape are clustered.clustered. The mostThe most representative patternsrepresentative patterns (feature vectors) are stored to(feature vectors) are stored to construct a registry for theconstruct a registry for the recognition stagerecognition stage
  • 86.
    6. Recognition In thisstage, the main task was to find cuts that divide upIn this stage, the main task was to find cuts that divide up connected components into their individual characters.connected components into their individual characters. The basic idea is to use a dynamic programming algorithm toThe basic idea is to use a dynamic programming algorithm to find a globally optimal set of cuts through the input stringfind a globally optimal set of cuts through the input string RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٦ find a globally optimal set of cuts through the input stringfind a globally optimal set of cuts through the input string (feature vector) which minimizes a certain cost function.(feature vector) which minimizes a certain cost function. The set of cuts and their precise shape are foundThe set of cuts and their precise shape are found simultaneously.simultaneously. The feature vector of the test stroke was compared againstThe feature vector of the test stroke was compared against the registry (direction after the other) until either a characterthe registry (direction after the other) until either a character was recognized (i.e., we decide a segmentation point) or thewas recognized (i.e., we decide a segmentation point) or the feature vector reached its end.feature vector reached its end.
  • 87.
    6. Recognition ..(cont’d) This comparison was performed using a dynamicThis comparison was performed using a dynamic programming technique called "programming technique called "Minimum Edit DistanceMinimum Edit Distance".". RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٧
  • 88.
    6. Recognition ..(cont’d) Example: assuming Insertion cost = Deletion cost =Example: assuming Insertion cost = Deletion cost = 11,, substitution cost =substitution cost = 22 RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٨
  • 89.
    6. Recognition ..(cont’d) GroupGroup11 = ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group= ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group22 = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h']= ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h'] & Group& Group33 = ['= ['11' '' '22' '' '33' '' '44' '' '55' '' '66' '' '77' '' '88'];']; The penalties are decided as follows:The penalties are decided as follows: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٨٩
  • 90.
    6. Recognition ..(cont’d) Insertion Cost = Substitution Cost/Insertion Cost = Substitution Cost/22 && Deletion Cost = Substitution Cost/Deletion Cost = Substitution Cost/22 The factors 'The factors '44' and '' and '1616' come from the assumption that short' come from the assumption that short RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٠ The factors 'The factors '44' and '' and '1616' come from the assumption that short' come from the assumption that short strokes (represented by Groupstrokes (represented by Group 22 directions) are almost halfdirections) are almost half the length of long strokes (represented by Groupthe length of long strokes (represented by Group 11 directions)directions) Other value sets for these factors were tried {Other value sets for these factors were tried {11..5522, (, (11..5522))22},}, {{22..5522, (, (22..5522))22}, {}, {3322, (, (3322))22}, {}, {33..5522, (, (33..5522))22}, {}, {4422, (, (4422))22}. We chose}. We chose {{2222, (, (2222))22} value set as they represent the smallest integer} value set as they represent the smallest integer values thus the total distances do not get so large.values thus the total distances do not get so large.
  • 91.
    6. Recognition ..(cont’d) The minimumThe minimum--editedit--distance technique is a good mathematicaldistance technique is a good mathematical measure but cannot be used solely with the chain codemeasure but cannot be used solely with the chain code feature.feature. We need either some offWe need either some off--line features or at least templateline features or at least template RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩١ We need either some offWe need either some off--line features or at least templateline features or at least template matching information.matching information. We usedWe used string matchingstring matching to find out the number of matchesto find out the number of matches between the representative patterns from the registry and thebetween the representative patterns from the registry and the test vector.test vector. The final cost function is given by the following equation:The final cost function is given by the following equation: matchesofNumber patterntiverepresentaofLength distance-edit-minimumDistance ×=
  • 92.
    6. Recognition ..(cont’d) The probable pattern shapes of the first character in the strokeThe probable pattern shapes of the first character in the stroke were stored as roots of individual trees.were stored as roots of individual trees. Each tree was completed by comparing the unEach tree was completed by comparing the un--identifiedidentified region of the feature vector to the registry again and again toregion of the feature vector to the registry again and again to RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٢ region of the feature vector to the registry again and again toregion of the feature vector to the registry again and again to find the probable pattern shapes of the second, third andfind the probable pattern shapes of the second, third and fourth characters till the whole stroke was totally recognized.fourth characters till the whole stroke was totally recognized. After tree construction, we were able to obtain a ranked list inAfter tree construction, we were able to obtain a ranked list in which each member comprised the characters (without dots)which each member comprised the characters (without dots) representing the stroke, ranked with their total edit distancerepresenting the stroke, ranked with their total edit distance ''DistanceDistance‘.‘.
  • 93.
    RuleRule--based Algorithm forOnbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٣
  • 94.
    RuleRule--based Algorithm forOnbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition FileFile 11:: FileFile 22:: ٩٤
  • 95.
    6. Recognition ..(cont’d) The last step left in this stage was the dot restoration.The last step left in this stage was the dot restoration. Two trials were done for assigning dots to the charactersTwo trials were done for assigning dots to the characters representing the stroke.representing the stroke. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٥ TrialTrial 11:: The dots centroids were calculated, as well as theThe dots centroids were calculated, as well as the centroid of each character per stroke and the dots werecentroid of each character per stroke and the dots were assigned to the character having the nearest centroid.assigned to the character having the nearest centroid. Despite of the large list size reduction and swapping correctDespite of the large list size reduction and swapping correct results to the top of the list, the dot position drifts causedresults to the top of the list, the dot position drifts caused wrong dot assignments to characters and therefore a lot ofwrong dot assignments to characters and therefore a lot of losses of correct choices as well.losses of correct choices as well.
  • 96.
    6. Recognition ..(cont’d) TrialTrial 22:: Trying different distributions of dots with the strokeTrying different distributions of dots with the stroke characters and checking the validity of their number andcharacters and checking the validity of their number and location to remove inconvenient list members.location to remove inconvenient list members. This trial was more successful, we were able to preserveThis trial was more successful, we were able to preserve RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٦ This trial was more successful, we were able to preserveThis trial was more successful, we were able to preserve almost all correct list members together with reasonablealmost all correct list members together with reasonable reduction percentage in the list size.reduction percentage in the list size. A new ranked list was obtained after removing inconvenientA new ranked list was obtained after removing inconvenient members.members.
  • 97.
    6. Recognition ..(cont’d) Example:Example: RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٧
  • 98.
    Results and Discussion RuleRule--basedAlgorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition TestTraining 44No. of writers 94317No. of words 4351814No. of char. ٩٨ Results representation:Results representation: Neskovic and Cooper [Neskovic and Cooper [1414], have developed an on], have developed an on--lineline segmentationsegmentation--byby--recognition system for English using HMMsrecognition system for English using HMMs together with Dynamic programming technique (Viterbi). Thetogether with Dynamic programming technique (Viterbi). The output of the system is a ranked set of words. The system'soutput of the system is a ranked set of words. The system's performance depends on the writer, on his style and the clarityperformance depends on the writer, on his style and the clarity of his writing: For good writers the correct word is in the topof his writing: For good writers the correct word is in the top 55 words overwords over 9797% of the time. For bad writers the correct word is% of the time. For bad writers the correct word is in the topin the top 55 words overwords over 9090% of the time.% of the time. 4351814No. of char.
  • 99.
    Results and Discussion.. (cont’d) Using the same terminology in [Using the same terminology in [1414], we can represent our], we can represent our results as follows:results as follows: Before dot restoration, the correct segmentationBefore dot restoration, the correct segmentation-- recognition results of the test strokes exist within the toprecognition results of the test strokes exist within the top list memberslist members 9393%% of the time (of the time (9696%% of the time for the testof the time for the test RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ٩٩ list memberslist members 9393%% of the time (of the time (9696%% of the time for the testof the time for the test characters).characters). After dot restoration, the correct segmentationAfter dot restoration, the correct segmentation--recognitionrecognition results of the test strokes exist within the top list membersresults of the test strokes exist within the top list members 9292%% of the time (of the time (9595%% of the time for the test characters).of the time for the test characters). Recognition Probability Correctly Recognized Total Number .95٤١٥٤٣٥Characters .92٢٧٩٣٠٥Strokes .74٧٠٩٤Words
  • 100.
    Results and Discussion.. (cont’d) Fortunately, the most correct recognition results exist at theFortunately, the most correct recognition results exist at the top of the ranked list.top of the ranked list. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition RecognitionChoices 180 ١٠٠ 0 20 40 60 80 100 120 140 160 180 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 37 43 53 58 74 116 194 420 521 Locationintherankedlist No.ofcorrectchoices____ Characters Strokes
  • 101.
    Results and Discussion.. (cont’d) The list sizes after dot restoration has been reducedThe list sizes after dot restoration has been reduced significantly with almost no loss for correct results.significantly with almost no loss for correct results. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ١٠١
  • 102.
    Results and Discussion.. (cont’d) TheThe 55% loss in the number of characters recognized is the% loss in the number of characters recognized is the consequence of two problems:consequence of two problems: 11.. Imperfect segmentationImperfect segmentation: due to not covering a large: due to not covering a large degree of writing varieties.degree of writing varieties. RuleRule--based Algorithm for Onbased Algorithm for On--line Cursiveline Cursive Handwriting segmentation & recognitionHandwriting segmentation & recognition ١٠٢ 22.. Wrong dot assignmentWrong dot assignment: due to writer drifts and strokes: due to writer drifts and strokes overlaps.overlaps. ∴∴ Increasing training samples from multi writers andIncreasing training samples from multi writers and avoiding overlaps is expected to give much better results.avoiding overlaps is expected to give much better results.
  • 103.
  • 104.
    The proposed workoverviewed both branches ofThe proposed work overviewed both branches of the handwritten Arabic character recognitionthe handwritten Arabic character recognition problem: the offproblem: the off--line and the online and the on--line, and attackedline, and attacked the problem from different sides:the problem from different sides: 11. Isolated and connected character problems. Isolated and connected character problems 22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems Summary, Conclusion & Future WorkSummary, Conclusion & Future Work ١٠٤ 22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems 33. Single output decision and multi. Single output decision and multi--outputs decisions.outputs decisions. using the simplest trend of solution: the ruleusing the simplest trend of solution: the rule-- based algorithms.based algorithms.
  • 105.
    We proposed anoffWe proposed an off--line character recognitionline character recognition system for isolated handwritten Arabic charactersystem for isolated handwritten Arabic character recognition,recognition, And we were able to achieve high results,And we were able to achieve high results, comparable to that achieved by other researcherscomparable to that achieved by other researchers Summary, Conclusion & Future WorkSummary, Conclusion & Future Work ١٠٥ comparable to that achieved by other researcherscomparable to that achieved by other researchers by proposing the idea of multiple classifier systemby proposing the idea of multiple classifier system besides using a classification hierarchy based onbesides using a classification hierarchy based on the structural features of Arabic characters andthe structural features of Arabic characters and using feature fusion.using feature fusion.
  • 106.
    We proposed aruleWe proposed a rule--based algorithm for the twobased algorithm for the two early stages of an onearly stages of an on--line cursive Arabicline cursive Arabic handwriting recognizer.handwriting recognizer. We followed a segmentationWe followed a segmentation--byby--recognitionrecognition approach, and we used the pen trajectory as theapproach, and we used the pen trajectory as the Summary, Conclusion & Future WorkSummary, Conclusion & Future Work ١٠٦ approach, and we used the pen trajectory as theapproach, and we used the pen trajectory as the feature with some modifications. We were able tofeature with some modifications. We were able to correctly segment and recognize most of the testcorrectly segment and recognize most of the test words.words.
  • 107.
    Following the pentrajectory causes the loss ofFollowing the pen trajectory causes the loss of the global pattern shape information which thethe global pattern shape information which the offoff--line image provides (e.g., confusions betweenline image provides (e.g., confusions between {{‫و‬‫و‬,, ‫ر‬‫ر‬} and {} and {‫ـھـ‬‫,ـھـ‬, ‫ـمفـ‬‫.)}ـمفـ‬}). On the other hand, converting the onOn the other hand, converting the on--line data toline data to Summary, Conclusion & Future WorkSummary, Conclusion & Future Work ١٠٧ On the other hand, converting the onOn the other hand, converting the on--line data toline data to bitmaps and trying to solve it as offbitmaps and trying to solve it as off--line is a veryline is a very hard task, still under research and is not yethard task, still under research and is not yet achieving reliable results [achieving reliable results [4040,, 4141,, 4242].]. Besides, the segmentation task is quite harder inBesides, the segmentation task is quite harder in case of offcase of off--line than online than on--line.line.
  • 108.
    As future workthat we can benefit both systems'As future work that we can benefit both systems' advantages by using onadvantages by using on--line/offline/off--line classifierline classifier ensemble system. In this case:ensemble system. In this case: The segmentation decision in offThe segmentation decision in off--line case is correctedline case is corrected using the onusing the on--line system andline system and The classification decision of the onThe classification decision of the on--line case isline case is corrected using the offcorrected using the off--line system.line system. Summary, Conclusion & Future WorkSummary, Conclusion & Future Work ١٠٨ corrected using the offcorrected using the off--line system.line system.
  • 109.
    As future workwe need:As future work we need: Summary, Conclusion & Future WorkSummary, Conclusion & Future Work ١٠٩ 1.1. Much larger neat training database from cooperative writersMuch larger neat training database from cooperative writers to enhance the results and automating the transcription fileto enhance the results and automating the transcription file creation process.creation process. 2.2. Introducing Language Model and solving ambiguitiesIntroducing Language Model and solving ambiguities linguistically to obtain single output.linguistically to obtain single output. 3.3. Encountering a large degree of writing variability by using aEncountering a large degree of writing variability by using a multiple classifier system and decision fusion.multiple classifier system and decision fusion. 4.4. Working on numerals and Reqaa font are still considered asWorking on numerals and Reqaa font are still considered as open issues and need more research.open issues and need more research.
  • 110.