SlideShare a Scribd company logo
1 of 6
Download to read offline
IOSR Journal of Engineering (IOSRJEN) www.iosrjen.org
ISSN (e): 2250-3021, ISSN (p): 2278-8719
Vol. 05, Issue 04 (April. 2015), ||V2|| PP 36-41
International organization of Scientific Research 36 | P a g e
A Divide And Conquer Method For Arabic Character
Recognition
Nehad H A Hammad1, Mohammed Elhafiz2
1
(Palestine Technical College, P.O.Box6037, Gaza, Palestine,
2
(Sudan University for Science and Technology, P.O.Box382,Mokren , Khartoum ,Sudan
Abstract: - Optical Characters Recognition (OCR) has been an active research area since the early days of
computer science. Despite the age of the subject, it remains one of the most challenging and exciting areas of
research in computer science. In recent years it has grown into a mature discipline, producing a huge body of
work. Arabic character recognition has been one of the last major languages to receive attention. In this paper a
new divide and conquer algorithm is proposed to recognize isolated handwritten Arabic characters using
character's curve tracing and number of dots information. A robust connected component labeling algorithm
were used to divide the characters into four groups. To overcome the problem of size and scaling the tangent
direction is used as feature (Divide-and-conquer). Four neural networks (NN) were used to classify each groups.
The primary result is promising and outperform one neural network classifier.
Keywords: - Arabic Character Recognition, Resizing, Feature Extraction, Neural Network.
I. INTRODUCTION
Handwriting recognition refers to the identification of written characters. The problem can be viewed
as a classification problem where we need to identify the most similarity character to the input character. The
challenge of the recognition system increases with the increase of number of characters.
Actually ,the language contains more number of characters shape, the identification would be more difficult than
the other language contains lesser number of characters.
Similarly we need looking to how the various characters are written and the differences between them.
The Arabic alphabet contains basically 28 letters are written from write to left. The Arabic isolated
character recognition is more complex because the reason is the similarities among the different letters and the
differences among the same latter. For example, letters Baa(‫,)ب‬ Taa(‫,)ت‬ and Thaa(‫)ث‬ are three different letters,
but they have similar body shape, they only differ in number and position of dots (one, two or three dots under
or above the body of the character). Also Jeem(‫,)ج‬ Hhaa(‫)ح‬ and Khaa(‫)خ‬ they differ only in one dot, and same
situation happens with the most of the rest[1].
In proposed method reduce similarity between each character by using number of character to four object(s).
The proposed method using "Divide and conquer" theory to divide Arabic character in four group depend on
number of objects in characters.
The connected component labelling using to determine the character member of which group,works by scanning
an image, pixel-by-pixel (from top to bottom and left to right) in order to identify connected pixel regions after
that the dots using to make sub-group if is it available.
components and dot(s) level and number. The Arabic Isolated character contain one
II. RELATED WORK
Haraty and Ghaddar (2003) propose the use of two neural networks to classify previously segmented
characters. Their method uses a skeleton representation and structural and quantitative features such as the
number and density of black pixels and the numbers of endpoints, loops, corner points, and branch points. On
2,132 characters, the recognition rate is over 73%[2].
Amin (2003) presents an automatic technique to learn rules for isolated characters. Structural features
including open curves in several directions are detected from the Freeman code representation of the skeleton of
each character and the relationships are determined with Inductive Logic Programming (ILP). Test data consist
of 40 samples of 120 different characters by different writers with 30 character samples used for training and 10
for testing for most experiments. A character recognition rate of 86.65% is obtained[3].
Alaei et al. (2010) proposed a two-stage approach for isolated handwritten Persian character
recognition. They extracted features based on modified chain code directional frequencies and employed an
SVM for classification. They obtained 98.1% and 96.6% recognition accuracy with 8-class and 32-class
problems, respectively[4] .
A Divide And Conquer Method For Arabic Character Recognition
International organization of Scientific Research 37 | P a g e
Desai (2010) presented a technique for Gujarati handwritten numeral recognition. the author used features
abstracted from four different profiles of digits with a multilayered feed forward neural network, and achieved
an approximate 82% recognition accuracy for Gujarati handwritten digit identification[5] .
Sharma and Jhajj (2010) extracted zoning features for handwritten Gurmukhi character recognition. They
employed two classifiers, namely k-NN and SVM. They achieved a maximum recognition accuracy of about
72.5% and 72.0% with k-NN and SVM, respectively [6].
Kumar et al. (2011) extracted intersection and open end points features for offline handwritten Gurmukhi
character recognition. They used SVM for classification, taking 90% of the dataset as a training set and 10% of
the dataset as a testing set. They achieved a maximum recognition accuracy of about 94.3%[7].
III. PREPROCESSING
Character recognition system perform steps before recognition. The first is image acquisition where the
images are being inputted to the computer using an optical device usually this input image colored or grey scale
image. Therefore most of recognition system convert this image to binary (0-black,1-white) this step call
binarization [8] .Other important steps are normalization, noise removal ,thinning[9].
IV. THE PROPOSED ALGORITHM
The proposed Arabic isolated characters recognition method divides all Arabic characters into four groups. This
group based on the number of connected component in the character.
Angle(Q) = arctan(pn(y)- pn+1(y) /pn(x)- pn+1(x)) * 180 / PI (1)
The method feature vector contains 17 values 15 represent characters contours and 2 represent dot(s) count and
position level .
After thinning ,15 points are taken from character trace, the angles between two adjacent points are is calculated
using Equation (1).Figure 4 illustrates the directions which has been chosen for experiments of the proposed
method.
A Divide And Conquer Method For Arabic Character Recognition
International organization of Scientific Research 38 | P a g e
For Arabic character dot(s) appear in 3 different positions upper level , middle level and lower level represented
in one value from 1 to 3 , one for upper , two for middle and three for lower.
Therefore we have the following categories:
 Category 1: Characters have one Dot in the upper(11) like (Gain ‫,غ‬ Feh‫,ف‬ khah‫خ‬ , Noon ‫ن‬ , Zen ‫,ز‬ Theh ‫ذ‬ ,
Theh ‫,ظ‬Zah ‫,)ض‬two Dots in the upper(12)like (Teh‫,)ت‬ three in the upper(13) like ( Theh‫ث‬ , Sheen ‫)ش‬
 Category 2 :characters have just one Dot in the middle(21) like (Jeem‫.)ج‬
 Category 3: characters have one Dot in the bottom(31) like (Beh ‫,ب‬ ),two Dots in the bottom(32) like
(Yeh ‫.)ي‬
Table1. Illustrates these Categories.
Dot Detection process:
1. Calculate the number isolated objects in the image .
2. If number of objects = 1 then pattern be belong to group one.
3. If number of objects >= 2 then
4. The Longest object size is character body and other is dot(s).
5. The character dots recognized using other process to calculate number of dots and position level.
6. The number of dots depend on dot length using thresholds and dots shape.
7. The dots level extracted when remove all space around character after divide the character height into 3
area on Y axis.
Number of Dots One Dot Two Dots Three Dots
No Dots [00] [00] [00]
Level 1(Up) [11] [12] [13]
Level 2(Mid) [21] - -
Level 3(Bottom) [31] [32] -
A Divide And Conquer Method For Arabic Character Recognition
International organization of Scientific Research 39 | P a g e
V. DATASET
The off-line Arabic isolated character Dataset is Sudan University For Science and Technology Arabic
Recognition Group (SUST ARG) used in proposed methods.
VI. EXPERIMENTAL AND RESULT
Group one contains all isolated Arabic characters with one part, so the character body written using connected
line, This group contains 12 characters ( ‫ا‬,‫د‬,‫ح‬,‫ع‬,‫هـ‬,‫ص‬,‫ط‬,‫م‬,‫ل‬,‫س‬,‫و‬,‫ر‬ ) .See Table 2 and Fig 9.
Table 2. Group One Isolated Arabic Characters Handwritten.
A Divide And Conquer Method For Arabic Character Recognition
International organization of Scientific Research 40 | P a g e
.
As this is biggest category ,neural network has been designed to classify the member of this group.6
experiments has been performed to test the six chosen directions . Table 3 shows the result of these six
experiments. These results show that eight direction outperform the other directions. That mean experiments on
more direction should be performed[10].
Table 3.Cross Validation for Group One Recognition.
Direction No Three Four Five Six Seven Eight
1 75.3 86.4 85.5 81.2 81.3 88.6
2 85.2 89.3 85.2 89.2 84.1 89.2
3 82.6 81.7 79.0 87.9 84.1 90.1
4 78.1 86.5 85.4 85.6 83.6 88.9
5 82.7 85.6 86.2 84.2 84.0 89.3
6 81.8 86.0 85.2 84.7 80.3 88.9
7 78.8 84.5 84.4 86.4 85.6 89.2
8 79.7 86.0 84.4 82.5 83.7 85.6
9 84.7 83.1 82.8 82.1 85.8 83.0
10 84.2 88.4 84.4 85.1 79.7 88.3
Average 81.31 85.75 84.25 84.89 83.22 88.11
Standard
Deviation
2.668 1.62 1.34 1.95 1.672 1.524
VII. CONCLUSION AND FURTHER WORK
Arabic character recognition methods are affected by several factors like characters shape, complexity
of drawing the character and the amount of rotation of the character on the horizontal line . To overcame these
problems we propose divide and conquer system the propose method divide the Arabic characters into 4 groups
.A neural network has been designed to classify each groups.
The result of this experiment can give however these result show that more directions should be tested.
A Divide And Conquer Method For Arabic Character Recognition
International organization of Scientific Research 41 | P a g e
Finally we look in this study to apply the proposed method to recognize other languages handwritten characters
and the development method to be more performed and efficient in online and offline handwritten characters.
REFERENCES
[1] Abdurazzag Ali ABURAS and Salem M. A. REHIEL ,(2007),"Off-line Omni-style Handwriting Arabic
Character Recognition System Based on Wavelet Compression", International Islamic University
Malaysia, Electrical and Computer Engineering, PP10, 50728, 53100.
[2] Haraty, R. and Ghaddar, C. Neuro-Classification for Handwritten Arabic Text. Proceedings ACS/IEEE
International Conference on Computer Systems and Applications,2003.
[3] Amin, A. Recognition of Hand-Printed Characters Based on Structural Description and Inductive Logic
Programming. Pattern Recognition Letters, vol. 24, pp. 3187-3196, 2003.
[4] A. Alaei, P. Nagabhushan, U. Pal, “A new two-stage scheme for the recognition of Persian handwritten
characters,” in Proc. of 12th ICFHR, pp.130-135, 2010.
[5] A. A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network,”
Pattern Recognition, vol. 43, no. 7, pp. 2582-2589, July 2010.
[6] D. V. Sharma, P. Jhajj, “Recognition of isolated handwritten characters in Gurmukhi script,”
International Journal of Computer Applications, vol. 4, no. 8, pp. 9-17, 2010.
[7] M. Kumar, M. K. Jindal, R. K. Sharma, “k-NN based offline handwritten Gurmukhi character
recognition,” in Proc. of ICIIP, pp. 1-4, 2011.
[8] JiaYonghong, “Digital image processing(The Second Edition)”.WuHan China: Wu Han university press,
pp 114-116,2010.
[9] Lam, Seong-Whan Lee, and Ching Y. Suen, "Thinning Methodologies-A Comprehensive Survey," IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol 14, No. 9, 1992.
[10] Kohavi, R.," A Study of Cross Validation and Bootstrap for Accuracy Estimation and model Selection",
Proceedings Of the 15th International Conference on Artificial Intelligence (IJCAI). pp. 1137-1143,
1995.

More Related Content

Similar to E05423641

Off line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic characterOff line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic charactercsandit
 
Critical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting RecognitionCritical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting RecognitionAmila Wijayarathna
 
Paper id 24201433
Paper id 24201433Paper id 24201433
Paper id 24201433IJRAT
 
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...IJERD Editor
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniquesijsrd.com
 
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...CSCJournals
 
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATIONTEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATIONcsandit
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text IJERA Editor
 
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...ijaia
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
Classifier fusion method to recognize
Classifier fusion method to recognizeClassifier fusion method to recognize
Classifier fusion method to recognizeIJCI JOURNAL
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACHDEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACHijcseit
 
Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionDr. Syed Hassan Amin
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
 

Similar to E05423641 (20)

Off line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic characterOff line system for the recognition of handwritten arabic character
Off line system for the recognition of handwritten arabic character
 
Critical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting RecognitionCritical Review on Off-Line Sinhala Handwriting Recognition
Critical Review on Off-Line Sinhala Handwriting Recognition
 
Ijetcas14 399
Ijetcas14 399Ijetcas14 399
Ijetcas14 399
 
Paper id 24201433
Paper id 24201433Paper id 24201433
Paper id 24201433
 
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniques
 
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
 
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATIONTEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
 
50120140505010
5012014050501050120140505010
50120140505010
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text
 
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
AUTOMATIC TRAINING DATA SYNTHESIS FOR HANDWRITING RECOGNITION USING THE STRUC...
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
E123440
E123440E123440
E123440
 
Classifier fusion method to recognize
Classifier fusion method to recognizeClassifier fusion method to recognize
Classifier fusion method to recognize
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACHDEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
 
Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq Recognition
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
 

More from IOSR-JEN

More from IOSR-JEN (20)

C05921721
C05921721C05921721
C05921721
 
B05921016
B05921016B05921016
B05921016
 
A05920109
A05920109A05920109
A05920109
 
J05915457
J05915457J05915457
J05915457
 
I05914153
I05914153I05914153
I05914153
 
H05913540
H05913540H05913540
H05913540
 
G05913234
G05913234G05913234
G05913234
 
F05912731
F05912731F05912731
F05912731
 
E05912226
E05912226E05912226
E05912226
 
D05911621
D05911621D05911621
D05911621
 
C05911315
C05911315C05911315
C05911315
 
B05910712
B05910712B05910712
B05910712
 
A05910106
A05910106A05910106
A05910106
 
B05840510
B05840510B05840510
B05840510
 
I05844759
I05844759I05844759
I05844759
 
H05844346
H05844346H05844346
H05844346
 
G05843942
G05843942G05843942
G05843942
 
F05843238
F05843238F05843238
F05843238
 
E05842831
E05842831E05842831
E05842831
 
D05842227
D05842227D05842227
D05842227
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

E05423641

  • 1. IOSR Journal of Engineering (IOSRJEN) www.iosrjen.org ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 05, Issue 04 (April. 2015), ||V2|| PP 36-41 International organization of Scientific Research 36 | P a g e A Divide And Conquer Method For Arabic Character Recognition Nehad H A Hammad1, Mohammed Elhafiz2 1 (Palestine Technical College, P.O.Box6037, Gaza, Palestine, 2 (Sudan University for Science and Technology, P.O.Box382,Mokren , Khartoum ,Sudan Abstract: - Optical Characters Recognition (OCR) has been an active research area since the early days of computer science. Despite the age of the subject, it remains one of the most challenging and exciting areas of research in computer science. In recent years it has grown into a mature discipline, producing a huge body of work. Arabic character recognition has been one of the last major languages to receive attention. In this paper a new divide and conquer algorithm is proposed to recognize isolated handwritten Arabic characters using character's curve tracing and number of dots information. A robust connected component labeling algorithm were used to divide the characters into four groups. To overcome the problem of size and scaling the tangent direction is used as feature (Divide-and-conquer). Four neural networks (NN) were used to classify each groups. The primary result is promising and outperform one neural network classifier. Keywords: - Arabic Character Recognition, Resizing, Feature Extraction, Neural Network. I. INTRODUCTION Handwriting recognition refers to the identification of written characters. The problem can be viewed as a classification problem where we need to identify the most similarity character to the input character. The challenge of the recognition system increases with the increase of number of characters. Actually ,the language contains more number of characters shape, the identification would be more difficult than the other language contains lesser number of characters. Similarly we need looking to how the various characters are written and the differences between them. The Arabic alphabet contains basically 28 letters are written from write to left. The Arabic isolated character recognition is more complex because the reason is the similarities among the different letters and the differences among the same latter. For example, letters Baa(‫,)ب‬ Taa(‫,)ت‬ and Thaa(‫)ث‬ are three different letters, but they have similar body shape, they only differ in number and position of dots (one, two or three dots under or above the body of the character). Also Jeem(‫,)ج‬ Hhaa(‫)ح‬ and Khaa(‫)خ‬ they differ only in one dot, and same situation happens with the most of the rest[1]. In proposed method reduce similarity between each character by using number of character to four object(s). The proposed method using "Divide and conquer" theory to divide Arabic character in four group depend on number of objects in characters. The connected component labelling using to determine the character member of which group,works by scanning an image, pixel-by-pixel (from top to bottom and left to right) in order to identify connected pixel regions after that the dots using to make sub-group if is it available. components and dot(s) level and number. The Arabic Isolated character contain one II. RELATED WORK Haraty and Ghaddar (2003) propose the use of two neural networks to classify previously segmented characters. Their method uses a skeleton representation and structural and quantitative features such as the number and density of black pixels and the numbers of endpoints, loops, corner points, and branch points. On 2,132 characters, the recognition rate is over 73%[2]. Amin (2003) presents an automatic technique to learn rules for isolated characters. Structural features including open curves in several directions are detected from the Freeman code representation of the skeleton of each character and the relationships are determined with Inductive Logic Programming (ILP). Test data consist of 40 samples of 120 different characters by different writers with 30 character samples used for training and 10 for testing for most experiments. A character recognition rate of 86.65% is obtained[3]. Alaei et al. (2010) proposed a two-stage approach for isolated handwritten Persian character recognition. They extracted features based on modified chain code directional frequencies and employed an SVM for classification. They obtained 98.1% and 96.6% recognition accuracy with 8-class and 32-class problems, respectively[4] .
  • 2. A Divide And Conquer Method For Arabic Character Recognition International organization of Scientific Research 37 | P a g e Desai (2010) presented a technique for Gujarati handwritten numeral recognition. the author used features abstracted from four different profiles of digits with a multilayered feed forward neural network, and achieved an approximate 82% recognition accuracy for Gujarati handwritten digit identification[5] . Sharma and Jhajj (2010) extracted zoning features for handwritten Gurmukhi character recognition. They employed two classifiers, namely k-NN and SVM. They achieved a maximum recognition accuracy of about 72.5% and 72.0% with k-NN and SVM, respectively [6]. Kumar et al. (2011) extracted intersection and open end points features for offline handwritten Gurmukhi character recognition. They used SVM for classification, taking 90% of the dataset as a training set and 10% of the dataset as a testing set. They achieved a maximum recognition accuracy of about 94.3%[7]. III. PREPROCESSING Character recognition system perform steps before recognition. The first is image acquisition where the images are being inputted to the computer using an optical device usually this input image colored or grey scale image. Therefore most of recognition system convert this image to binary (0-black,1-white) this step call binarization [8] .Other important steps are normalization, noise removal ,thinning[9]. IV. THE PROPOSED ALGORITHM The proposed Arabic isolated characters recognition method divides all Arabic characters into four groups. This group based on the number of connected component in the character. Angle(Q) = arctan(pn(y)- pn+1(y) /pn(x)- pn+1(x)) * 180 / PI (1) The method feature vector contains 17 values 15 represent characters contours and 2 represent dot(s) count and position level . After thinning ,15 points are taken from character trace, the angles between two adjacent points are is calculated using Equation (1).Figure 4 illustrates the directions which has been chosen for experiments of the proposed method.
  • 3. A Divide And Conquer Method For Arabic Character Recognition International organization of Scientific Research 38 | P a g e For Arabic character dot(s) appear in 3 different positions upper level , middle level and lower level represented in one value from 1 to 3 , one for upper , two for middle and three for lower. Therefore we have the following categories:  Category 1: Characters have one Dot in the upper(11) like (Gain ‫,غ‬ Feh‫,ف‬ khah‫خ‬ , Noon ‫ن‬ , Zen ‫,ز‬ Theh ‫ذ‬ , Theh ‫,ظ‬Zah ‫,)ض‬two Dots in the upper(12)like (Teh‫,)ت‬ three in the upper(13) like ( Theh‫ث‬ , Sheen ‫)ش‬  Category 2 :characters have just one Dot in the middle(21) like (Jeem‫.)ج‬  Category 3: characters have one Dot in the bottom(31) like (Beh ‫,ب‬ ),two Dots in the bottom(32) like (Yeh ‫.)ي‬ Table1. Illustrates these Categories. Dot Detection process: 1. Calculate the number isolated objects in the image . 2. If number of objects = 1 then pattern be belong to group one. 3. If number of objects >= 2 then 4. The Longest object size is character body and other is dot(s). 5. The character dots recognized using other process to calculate number of dots and position level. 6. The number of dots depend on dot length using thresholds and dots shape. 7. The dots level extracted when remove all space around character after divide the character height into 3 area on Y axis. Number of Dots One Dot Two Dots Three Dots No Dots [00] [00] [00] Level 1(Up) [11] [12] [13] Level 2(Mid) [21] - - Level 3(Bottom) [31] [32] -
  • 4. A Divide And Conquer Method For Arabic Character Recognition International organization of Scientific Research 39 | P a g e V. DATASET The off-line Arabic isolated character Dataset is Sudan University For Science and Technology Arabic Recognition Group (SUST ARG) used in proposed methods. VI. EXPERIMENTAL AND RESULT Group one contains all isolated Arabic characters with one part, so the character body written using connected line, This group contains 12 characters ( ‫ا‬,‫د‬,‫ح‬,‫ع‬,‫هـ‬,‫ص‬,‫ط‬,‫م‬,‫ل‬,‫س‬,‫و‬,‫ر‬ ) .See Table 2 and Fig 9. Table 2. Group One Isolated Arabic Characters Handwritten.
  • 5. A Divide And Conquer Method For Arabic Character Recognition International organization of Scientific Research 40 | P a g e . As this is biggest category ,neural network has been designed to classify the member of this group.6 experiments has been performed to test the six chosen directions . Table 3 shows the result of these six experiments. These results show that eight direction outperform the other directions. That mean experiments on more direction should be performed[10]. Table 3.Cross Validation for Group One Recognition. Direction No Three Four Five Six Seven Eight 1 75.3 86.4 85.5 81.2 81.3 88.6 2 85.2 89.3 85.2 89.2 84.1 89.2 3 82.6 81.7 79.0 87.9 84.1 90.1 4 78.1 86.5 85.4 85.6 83.6 88.9 5 82.7 85.6 86.2 84.2 84.0 89.3 6 81.8 86.0 85.2 84.7 80.3 88.9 7 78.8 84.5 84.4 86.4 85.6 89.2 8 79.7 86.0 84.4 82.5 83.7 85.6 9 84.7 83.1 82.8 82.1 85.8 83.0 10 84.2 88.4 84.4 85.1 79.7 88.3 Average 81.31 85.75 84.25 84.89 83.22 88.11 Standard Deviation 2.668 1.62 1.34 1.95 1.672 1.524 VII. CONCLUSION AND FURTHER WORK Arabic character recognition methods are affected by several factors like characters shape, complexity of drawing the character and the amount of rotation of the character on the horizontal line . To overcame these problems we propose divide and conquer system the propose method divide the Arabic characters into 4 groups .A neural network has been designed to classify each groups. The result of this experiment can give however these result show that more directions should be tested.
  • 6. A Divide And Conquer Method For Arabic Character Recognition International organization of Scientific Research 41 | P a g e Finally we look in this study to apply the proposed method to recognize other languages handwritten characters and the development method to be more performed and efficient in online and offline handwritten characters. REFERENCES [1] Abdurazzag Ali ABURAS and Salem M. A. REHIEL ,(2007),"Off-line Omni-style Handwriting Arabic Character Recognition System Based on Wavelet Compression", International Islamic University Malaysia, Electrical and Computer Engineering, PP10, 50728, 53100. [2] Haraty, R. and Ghaddar, C. Neuro-Classification for Handwritten Arabic Text. Proceedings ACS/IEEE International Conference on Computer Systems and Applications,2003. [3] Amin, A. Recognition of Hand-Printed Characters Based on Structural Description and Inductive Logic Programming. Pattern Recognition Letters, vol. 24, pp. 3187-3196, 2003. [4] A. Alaei, P. Nagabhushan, U. Pal, “A new two-stage scheme for the recognition of Persian handwritten characters,” in Proc. of 12th ICFHR, pp.130-135, 2010. [5] A. A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network,” Pattern Recognition, vol. 43, no. 7, pp. 2582-2589, July 2010. [6] D. V. Sharma, P. Jhajj, “Recognition of isolated handwritten characters in Gurmukhi script,” International Journal of Computer Applications, vol. 4, no. 8, pp. 9-17, 2010. [7] M. Kumar, M. K. Jindal, R. K. Sharma, “k-NN based offline handwritten Gurmukhi character recognition,” in Proc. of ICIIP, pp. 1-4, 2011. [8] JiaYonghong, “Digital image processing(The Second Edition)”.WuHan China: Wu Han university press, pp 114-116,2010. [9] Lam, Seong-Whan Lee, and Ching Y. Suen, "Thinning Methodologies-A Comprehensive Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 14, No. 9, 1992. [10] Kohavi, R.," A Study of Cross Validation and Bootstrap for Accuracy Estimation and model Selection", Proceedings Of the 15th International Conference on Artificial Intelligence (IJCAI). pp. 1137-1143, 1995.