SlideShare a Scribd company logo
1 of 5
Download to read offline
Machine Printed Handwritten Text Discrimination
                    Using Radon Transform and SVM Classifier


                                          ET-Tahir Zemouri1 and Youcef Chibani 2
                           Signal Processing Laboratory, Faculty of Electronic and Computer Sciences
                                   University of Sciences and Technology Houari Boumediene
                                       USTHB, EL-Alia, B.P. 32, 16111, Algiers, Algeria
                                           1
                                             tzemouri @usthb.dz, 2 ychibani@usthb.dz



Abstract—Discrimination        of   machine      printed     and   handwritten lines in Bangla script. Guo and Ma [3] proposed
handwritten text is deemed as major problem in the                 an approach based on the vertical projection profile of the
recognition of the mixed texts. In this paper, we address the      segmented words, which used a Hidden Markov Model
problem of identifying each type by using the Radon transform      (HMM) as the classifier. Zheng et al. [4] reported on printed
and Support Vector Machines, which is conducted at three           and handwritten text segmentation using k-NN, Support
steps: preprocessing, feature generation and classification. New   Vector Machines (SVM) and Fisher classifier with features
set of features is generated from each word using the Radon        like pixel density, aspect ratio and Gabor features. Kandan et
transform. Classification is used to distinguish printed text      al. [5] used invariant moments, which are insensitive to
from handwritten. The proposed system is tested on IAM             translation, scale, mirroring and rotation as the feature for
databases. The recognition rate of the proposed method is          distinguishing the printed and handwritten elements and the
calculated to be over 98%.
                                                                   SVM classifier.
                                                                        We propose in this paper a new method for text
   Keywords-document analysis; machine printed and                 discrimination by using the Radon transform and Support
handwritten text discrimination; Radon transform; Support
                                                                   Vector Machines.
Vector Machines (SVM).
                                                                        The Radon transform is adapted for detecting linear
                                                                   features. Hence, printed words generate Radon coefficients
                      I.    INTRODUCTION                           more regular comparatively to handwritten words. This
    Machine printed and handwritten text are often met in          property can be used for distinguishing between printed and
application forms, question papers, mail as well as notes,         handwritten words. While, the SVM is well adapted for a
corrections and instructions in printed documents.                 robust separation of two classes.
    In all mentioned cases it is crucial to detect, distinguish         The paper is organized as follows. In section 2, we
and process differently the areas of handwritten and printed       describe the proposed system. Experiments and conclusions
text (OCR for machine printed text and ICR for handwritten         are discussed in Sections 3 and 4, respectively.
annotations) for obvious reasons such as: (a) retrieval of
important information (identification of handwriting in                            II.   THE PROPOSED SYSTEM
application forms), (b) removal of unnecessary information             The system for the discrimination between machine
(removal of handwritten notes from official documents), and        printed and handwritten text can be decomposed into three
(c) application of different recognition algorithms in each        stages [1], as shown in Fig. 1. The first stage is the
case.                                                              preprocessing stage, in which the document is cleaned of all
    The main difference between machine printed and                the noise components present such as spurious dots and
handwritten text is their shape structure. Characters in           lines. In the second stage, features are generated based on
machine printed text have a uniform shape. Whereas                 Radon transform, for which the elements are classified into
handwritten text are of arbitrary curly allograph styles. This     printed or handwritten using SVM classifiers.
difference can be exploited for generating features by
exploring the regularity of the machine printed words              A. Preprocessing stage
comparatively of the handwritten words.                                 Due to large variations in image data, preprocessing,
    There exist a few papers on the discrimination of              which is used to reduce variations and produce a more
machine printed and handwritten text. Kuhnke et al. [1]            consistent set of data, is essential for accurate character
proposed a neural network-based approach with straightness         recognition. In our system, preprocessing includes the
and symmetry as features. Pal and Chaudhuri [2] have used          filtering, binarization, skew angle correction, smoothing, and
horizontal projection profiles for separating the printed and      word segmentation.
characters is more or less stable within a text word. On the
                        Document image                             other hand, the distribution of the shape of handwritten
                                                                   characters is quite diverse.
                                                                       The Radon transform has been used in many pattern
                          Preprocessing                            recognition applications as shape recognition [11]. In our
                                                                   approach, the Radon transform is used as a tool for
    Filtering          Binarization            Skew correction     generating a feature vector. Hence, we briefly review its
                                                                   main properties.
                                                                        1) Radon Transform
             Segmentation               Smoothing
                                                                       The Radon transform computes projections of an image
                                                                   along specified directions. A projection of a two-dimensional
                                                                   function I ( x, y ) is a set of line integrals. The Radon
                       Feature generation                          transform computes the line integrals from multiple sources
                                                                   along parallel paths in a certain direction. To represent an
                                                                   image, the Radon transform takes multiple and parallel
                          Classification                           projections of the image from different angles by rotating the
                                                                   source around the center of the image. Formally, the Radon
                                                                   transform of an image is defined as [12]:
          Machine printed                 Handwritten
                                                                   TRI ( ρ ,θ ) = ∫x ∫y I ( x, y )δ ( x cosθ + y sin θ − ρ )dxdy      (1)
        Figure 1. Block-diagram of the classification system.

     1) Image filtering: Generally, the image acquired from            where δ          is the Dirac function, θ ∈]0,180°] and
a scanner contains the noise, which can be reduced using a         ρ ∈] - ∞,+∞] . In other words, TRI is the integral of I ( x, y )
3x3 Wiener filter [6].
                                                                   over the line defined by ρ = x cos θ + y sin θ .
     2) Binarization: the text is separated from background
                                                                       The Radon transform has several useful properties, as
by automatic thresholding. The Wolf approach [7] is used to
                                                                   periodicity, symmetry, translation invariance, rotation
the binary image.                                                  invariance and scaling invariance.
     3) Skew angle correction: The skew estimation and                 In our approach, we only are interested on periodicity
correction is an important step in any document analysis and       and symmetry. Fig. 2 shows an example of the Radon
recognition system. Hence, we use the projection profile for       transform computed on the printed and handwritten words.
estimating the skew angle [8], which can be performed for
different angles and the largest magnitude variations
correspond to the skew angle.
     4) Smoothing: For smoothing binary document images,
four filters [9] can be used to smooth the edges and removing
the small pieces of noise.
     5) Segmentation: Segmentation aims to extract the
words from the document. Segmentation is performed in two
consecutive steps: line segmentation and word segmentation.
Both steps make use of the projection profiles [10].
B. Feature Generation
    Many kinds of features can be generated for distinguish
the printed from handwritten text, Kuhnke et al. [1] proposed
a straightness of vertically/horizontally oriented lines and
symmetry relative to different points as features. Pal and
Chaudhuri [2] used the distinctive structural and statistical
features. Guo and Ma [3] evaluated their scheme using the
vertical projection profile. Zheng et al. [4] used features like           (a)                                        (b)
Gabor filter, Run length histogram features etc. Kandan et al.                   Figure 2. A shape (a) and its Radon transform (b).
[5] used the invariant moments that are invariant under
translation, scaling, rotation and reflection.                         We can easily see that the Radon transform generates
    The main idea of our approach is to take advantage of the      more coefficients of the handwritten word comparatively to
structural properties that help to discriminate printed from       the printed word.
handwritten text. More precisely, the shape of the printed
2) Feature vector generation                                                  We can see that the energy based-Radon transform
    To generate features of printed and handwritten words,                     generates more energy of the handwritten word
we fix the angular direction number denoted by Nθ                              comparatively to the printed word.
( θ ∈]0,360°] ). Since, the Radon transform generates                               3) Feature vector normalization
redundant coefficients (Fig 2.b), hence, in our approach, we                       In many practical situations, a designer is confronted
select the positive radial projections and taking all directions               with features whose values lie within different dynamic
from 0 to 360°. The feature vector is then generated by                        ranges. Thus, features with large values may have a larger
computing for a given column in positive space of the Radon                    influence in the cost function than features with small values,
transform, the sum of the square coefficient by setting the                    although this does not necessarily reflect their respective
number of angular direction Nθ . The feature values E I (θ )                   significance in the design of the classifier. The problem is
are defined as:                                                                overcome by normalizing the features so that their values lie
                                                                               within similar ranges. This is achieved by using nonlinear
                                                                               transformation [13].
                                  1
                     E I (θ ) =      ∑ N ρ TR ( ρ ,θ )
                                             I         2
                                                                         (2)
                                  Nρ                                           C. Classification
                                                                                   SVM are supervised learning methods, which have been
Fig. 3 illustrates an example of feature generation values                     widely and successfully used for pattern recognition in
which include the Radon transform energy for each angle θ .                    different applications as digit recognition [14]. The main
                                                                               concept of SVM lies to find a hyperplane that allows
                                                                               separating two classes, leaving the largest margin between
                                                                               the vectors of the two classes [14]. However, in real life,
                                                                               problems can be linearly non separable. To deal with this
                                                                               problem, a nonlinear decision surface is obtained by lifting
                                                                               the feature space into a higher dimensional space. A linear
                                                                               separating hyperplane is found in the higher dimensional
                                                                               space that gives a nonlinear decision surface in the original
                                                                               feature space. The decision function of the SVM can be
                                                                               expressed as follows:
                                    (a)

                                                                                                    f ( x) = ∑ α i yi K ( x, xi ) + b        (3)
                                                                                                              i



                                                                                   Where ( xi , yi ) ∈ ℜ d X{± 1} are the feature vectors and
                                                                               labels, respectively. In our case, the feature vectors and
                                                                               labels correspond to the Radon energy {xi } , printed words
                                                                               {+1} and handwritten words {-1}, respectively. Parameters
                                                                               α i and b are found by maximizing a quadratic function
                                                                               subject to some constraints [14]. K ( x, xi ) is the kernel
                                                                               function, which allows mapping the feature vectors into a
                                    (b)                                        higher dimension inner product space. In our case, we use
                                                                               the RBF kernel (Radial Function Basis) since it offers better
                                                                               discrimination than other kernels. The RBF kernel is defined
                                                                               as:

                                                                                                                          d ( x, xi )
                                                                                                    K ( x, xi ) = exp(−               )      (4)
                                                                                                                            2σ 2
                                                                                                                                2
                                                                                                        d ( x, xi ) = x − xi                 (5)

                                                                               σ is user defined.
                                    (c)
                                                                                   The optimization algorithm adopted for training SVMs is
                                                                               the Sequential Minimal Optimization (SMO) which provides
   Figure 3. Feature vector generation, (a) Printed word and its Radon
    transform, (b) handwritten word and its Radon transform, (b) Radon
                                                                               practical advantages [15].
                 transform, (c) Radon energy versus angle.
III.     EXPERIMENTAL RESULTS                         B. System validation
                                                                          In order to validate our system various experiments are
A. Data set                                                           conducted for finding the SVM regularization parameter
    For evaluating the performances of the proposed method,           (fixed at 10), kernel parameter ( σ ) and the best angular
we use the IAM database (Institut für Informatik und                  direction number ( Nθ ). Fig. 5 shows the recognition rate
angewandte Mathematik) [16]. They are scanned with
resolution of 300 dpi, 8 bits/pixel, gray-scale and converted         obtained on the validation set for each angular direction
into binary images using the Wolf binarization method. This           number. We can note that the RR is not very sensitive to the
database is formed for more than 1500 documents containing            number of the angular direction. However, the best
printed and handwritten text. An example of a document can            performances (RR=77.06%) are obtained for Nθ =20 and
be seen in Fig. 4. Regions of printed and handwritten words           σ =2.1.
are easily separable. They present no auxiliary lines to fill or
to supply with written texts. This characteristic facilitates the
identification and classification of each type of words.
    For testing the performances of our system, 21 images
are chosen and preprocessed. The set of words are divided
into three subsets for training (1/3), validating (1/3) and
testing (1/3), respectively. Table 1 summaries the data set.
    For each word, a vector with the energy based-Radon
Transform is calculated. We use the recognition rate (RR) as
a metric to evaluate the performances of our system, which is
defined as:

                  # of words correctly classified
           RR =                                   (%)           (7)              Figure 5. Recognition rate using Radon transform
                         # total of words                                                   for the system validation.

                                                                          In order to improve the recognition rate, we add by
                                                                      concatenation statistical features to the energy based-Radon
                                                                      transform, which are mean, variance, variance of projection
                                                                      profile (vertical and horizontal) and entropy. Fig. 6 shows
                                                                      the recognition rate versus the number of the angular
                                                                      direction.




                                                                      Figure 6. Recognition rate using Radon transform and statistical features.

                                                                          We can see that statistical feature sets are very suitable
                  Figure 4. IAM Database form.                        information for the discrimination between machine printed
                                                                      and handwritten text since the RR has been improved to
                                                                      92.8% for Nθ =10 and σ =2 using validation set. This
                       TABLE I.       DATA SET
                                                                      constitutes an additional advantage when adding the
    Data set               Training     Validation   Testing          statistical features.
    Machine printed          447           447        438             C. System testing
    Handwritten              525           525        484                 After the validation of the system, the testing set is used
    Total                    972           972        922             for evaluating its performances. Hence, the optimal values of
the system validation are used for computing the recognition                     [6]    B. Gatos, I. Pratikakis and S. J. Perantonis, “Adaptive degraded
rate. The obtained results are 98.32%, which constitutes                                document image binarization,” Pattern Recognition, vol. 39, pp. 317-
                                                                                        327, 2006.
encouraging performances compared to other works [1-5].
                                                                                 [7]    C. Wolf, and J.M. Jolion, “Extraction and recognition of artificial text
D. Comparaison with other similar works                                                 in multimedia documents,” Pattern Analysis and Applications, vol. 6,
                                                                                        n. 4, pp. 309-326, 2003.
    We compare our results with some other published                             [8]    T. Akiyama, and N. Hagita, “Automatic entry system for printed
research works in terms of RR. Hence, Kuhnke et al. [1]                                 documents,” Pattern Recognition, vol. 23, n. 11, pp. 1141-1154, 1990.
proposed a neural network-based approach with straightness                       [9]    M. Cheriet, N. Kharma, C. L. Liu, and C. Suen, “Character
of vertically/horizontally oriented lines and symmetry                                  Recognition Systems: A Guide for Students and Practitioners,”
relative to different points as features. The system reached a                          Wiley-Interscience editor, p 321, 2007.
RR of 78.5%. Pal and Chaudhuri [2] approach based on the                         [10]   E. Ataer, and P. Duygulu, “Retrieval of Ottoman Documents,” Proc
                                                                                        8th ACM international workshop on Multimedia information
distinctive structural and statistical features of machine                              retrieval, pp. 155-162, 2006.
printed and handwritten text lines in Bangla script. The                         [11]   S. Tabbone ,L. Wendling, and J. P. Salmon, “A new shape descriptor
classification scheme has a RR of 98.3%. Guo and Ma [3]                                 defined on the Radon transform,” Computer Vision and Image
evaluated their scheme using the vertical projection profile of                         Understanding, vol.102, n. 1, pp. 42–51, 2006.
the segmented word and obtained a 92.86% from their                              [12]   S. R. Deans, “The Radon Transform and Some of Its Applications.
                                                                                        New York: Wiley, 1983.
scheme using HMM. Zheng et al. [4] got a RR of 96% using
                                                                                 [13]   S. Theodoridis, and K. Koutroumbas, “Pattern Recognition,” 4th Ed,
SVM classifier and features like Gabor filter, Run length                               Elsevier Inc, 2009.
histogram features etc. Kandan et al. [5] obtained a RR of                       [14]   H. Nemmour, Y. Chibani, “Handwritten digit recognition based on a
93.22% using the invariant moments that are invariant under                             neural-SVM combination”, Int journal of computers and applications
translation, scaling, rotation and reflection as features and                           (Acta Press Editor), vol. 32, n.1, pp. 104-109, 2010.
SVM classifier.                                                                  [15]   H. Nemmour, Y. Chibani, “Integrating class-dependant tangent
    Our proposed method obtains a RR of 98.32% by using                                 vectors into SVMs for handwritten digit recognition,” Int Conf on
                                                                                        Signals, Circuits and Systems (ICSCS), pp. 1-4, 2009.
Radon transform and statistical features and SVM classifier,
                                                                                 [16]   U.V. Marti, and H. Bunke, “The IAM-Database: an english sentence
which constitutes encouraging performances compared to                                  database for offline handwriting recognition,” International Journal
other works.                                                                            on Document Analysis and Recognition, vol. 5, n. 1, pp. 39-46, 2002.

                           IV.     CONCLUSION
    In this paper, we proposed a new method for
discriminating printed and handwritten text in document
images using the Radon transform and SVM classifiers. The
system was implemented and tested in IAM databases.
    Our approach presents encouraging results by combining
Radon energy and statistical features using SVM classifiers
with the RBF kernel.
    In the future, we plane to implement our methodology to
distinguish machine printed/handwritten with Arabic and
Latin texts.
                              REFERENCES
[1]   K. Kuhnke, L. Simoncini, and Z.M. Kovacs-V, “A System for
      Machine-Written and Hand-Written Character Distinction,” Proc. 3rd
      International Conference on Document Analysis and Recognition,
      vol. 2, pp 811-814, 1995.
[2]   U. Pal, and B. B. Chaudhuri, “Machine-printed and Hand-written
      Text Line Identification,” Pattern Recognition Letters, vol. 22, n. 3-4,
      pp. 431-441, 2001.
[3]   J. K. Guo, and M. Y. Ma, “Separating Handwritten Material from
      Machine Printed Text Using Hidden Markov Models,” Proc. 6th
      International Conference on Document Analysis and Recognition, pp.
      439-443, 2001.
[4]   Y. Zheng, H. Li, and D. Doermann, “Machine Printed Text and
      Handwriting Identification in Noisy Document Images,” IEEE Trans
      on Pattern Analysis and Machine Intelligence, vol. 26, n. 3, pp. 337-
      353, 2004.
[5]   R. Kandan, N. K. Reddy, K. R. Arvind, and A. G. Ramakrishnan, “A
      Robust Two Level Classification Algorithm for Text Localization in
      Documents,” Advances in Visual Computing, 3rd Int Symp, (ISVC
      07), Part II, LNCS 4842, pp. 96–105, 2007.

More Related Content

Similar to Machine Printed Handwritten Text Discrimination

A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterIOSR Journals
 
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR TechniquesA Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniquescscpconf
 
Improvement of the Recognition Rate by Random Forest
Improvement of the Recognition Rate by Random ForestImprovement of the Recognition Rate by Random Forest
Improvement of the Recognition Rate by Random ForestIJERA Editor
 
Improvement oh the recognition rate by random forest
Improvement oh the recognition rate by random forestImprovement oh the recognition rate by random forest
Improvement oh the recognition rate by random forestYoussef Rachidi
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniquesijsrd.com
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONIJNLC Int.Jour on Natural Lang computing
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONkevig
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTScscpconf
 
Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...IOSR Journals
 
Shot Boundary Detection using Radon Projection Method
Shot Boundary Detection using Radon Projection MethodShot Boundary Detection using Radon Projection Method
Shot Boundary Detection using Radon Projection MethodIDES Editor
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Similar to Machine Printed Handwritten Text Discrimination (20)

A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
A017240107
A017240107A017240107
A017240107
 
E123440
E123440E123440
E123440
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
 
L017116064
L017116064L017116064
L017116064
 
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR TechniquesA Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
 
Co4201605611
Co4201605611Co4201605611
Co4201605611
 
Improvement of the Recognition Rate by Random Forest
Improvement of the Recognition Rate by Random ForestImprovement of the Recognition Rate by Random Forest
Improvement of the Recognition Rate by Random Forest
 
Improvement oh the recognition rate by random forest
Improvement oh the recognition rate by random forestImprovement oh the recognition rate by random forest
Improvement oh the recognition rate by random forest
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniques
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
 
Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...
 
Shot Boundary Detection using Radon Projection Method
Shot Boundary Detection using Radon Projection MethodShot Boundary Detection using Radon Projection Method
Shot Boundary Detection using Radon Projection Method
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
L017248388
L017248388L017248388
L017248388
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Machine Printed Handwritten Text Discrimination

  • 1. Machine Printed Handwritten Text Discrimination Using Radon Transform and SVM Classifier ET-Tahir Zemouri1 and Youcef Chibani 2 Signal Processing Laboratory, Faculty of Electronic and Computer Sciences University of Sciences and Technology Houari Boumediene USTHB, EL-Alia, B.P. 32, 16111, Algiers, Algeria 1 tzemouri @usthb.dz, 2 ychibani@usthb.dz Abstract—Discrimination of machine printed and handwritten lines in Bangla script. Guo and Ma [3] proposed handwritten text is deemed as major problem in the an approach based on the vertical projection profile of the recognition of the mixed texts. In this paper, we address the segmented words, which used a Hidden Markov Model problem of identifying each type by using the Radon transform (HMM) as the classifier. Zheng et al. [4] reported on printed and Support Vector Machines, which is conducted at three and handwritten text segmentation using k-NN, Support steps: preprocessing, feature generation and classification. New Vector Machines (SVM) and Fisher classifier with features set of features is generated from each word using the Radon like pixel density, aspect ratio and Gabor features. Kandan et transform. Classification is used to distinguish printed text al. [5] used invariant moments, which are insensitive to from handwritten. The proposed system is tested on IAM translation, scale, mirroring and rotation as the feature for databases. The recognition rate of the proposed method is distinguishing the printed and handwritten elements and the calculated to be over 98%. SVM classifier. We propose in this paper a new method for text Keywords-document analysis; machine printed and discrimination by using the Radon transform and Support handwritten text discrimination; Radon transform; Support Vector Machines. Vector Machines (SVM). The Radon transform is adapted for detecting linear features. Hence, printed words generate Radon coefficients I. INTRODUCTION more regular comparatively to handwritten words. This Machine printed and handwritten text are often met in property can be used for distinguishing between printed and application forms, question papers, mail as well as notes, handwritten words. While, the SVM is well adapted for a corrections and instructions in printed documents. robust separation of two classes. In all mentioned cases it is crucial to detect, distinguish The paper is organized as follows. In section 2, we and process differently the areas of handwritten and printed describe the proposed system. Experiments and conclusions text (OCR for machine printed text and ICR for handwritten are discussed in Sections 3 and 4, respectively. annotations) for obvious reasons such as: (a) retrieval of important information (identification of handwriting in II. THE PROPOSED SYSTEM application forms), (b) removal of unnecessary information The system for the discrimination between machine (removal of handwritten notes from official documents), and printed and handwritten text can be decomposed into three (c) application of different recognition algorithms in each stages [1], as shown in Fig. 1. The first stage is the case. preprocessing stage, in which the document is cleaned of all The main difference between machine printed and the noise components present such as spurious dots and handwritten text is their shape structure. Characters in lines. In the second stage, features are generated based on machine printed text have a uniform shape. Whereas Radon transform, for which the elements are classified into handwritten text are of arbitrary curly allograph styles. This printed or handwritten using SVM classifiers. difference can be exploited for generating features by exploring the regularity of the machine printed words A. Preprocessing stage comparatively of the handwritten words. Due to large variations in image data, preprocessing, There exist a few papers on the discrimination of which is used to reduce variations and produce a more machine printed and handwritten text. Kuhnke et al. [1] consistent set of data, is essential for accurate character proposed a neural network-based approach with straightness recognition. In our system, preprocessing includes the and symmetry as features. Pal and Chaudhuri [2] have used filtering, binarization, skew angle correction, smoothing, and horizontal projection profiles for separating the printed and word segmentation.
  • 2. characters is more or less stable within a text word. On the Document image other hand, the distribution of the shape of handwritten characters is quite diverse. The Radon transform has been used in many pattern Preprocessing recognition applications as shape recognition [11]. In our approach, the Radon transform is used as a tool for Filtering Binarization Skew correction generating a feature vector. Hence, we briefly review its main properties. 1) Radon Transform Segmentation Smoothing The Radon transform computes projections of an image along specified directions. A projection of a two-dimensional function I ( x, y ) is a set of line integrals. The Radon Feature generation transform computes the line integrals from multiple sources along parallel paths in a certain direction. To represent an image, the Radon transform takes multiple and parallel Classification projections of the image from different angles by rotating the source around the center of the image. Formally, the Radon transform of an image is defined as [12]: Machine printed Handwritten TRI ( ρ ,θ ) = ∫x ∫y I ( x, y )δ ( x cosθ + y sin θ − ρ )dxdy (1) Figure 1. Block-diagram of the classification system. 1) Image filtering: Generally, the image acquired from where δ is the Dirac function, θ ∈]0,180°] and a scanner contains the noise, which can be reduced using a ρ ∈] - ∞,+∞] . In other words, TRI is the integral of I ( x, y ) 3x3 Wiener filter [6]. over the line defined by ρ = x cos θ + y sin θ . 2) Binarization: the text is separated from background The Radon transform has several useful properties, as by automatic thresholding. The Wolf approach [7] is used to periodicity, symmetry, translation invariance, rotation the binary image. invariance and scaling invariance. 3) Skew angle correction: The skew estimation and In our approach, we only are interested on periodicity correction is an important step in any document analysis and and symmetry. Fig. 2 shows an example of the Radon recognition system. Hence, we use the projection profile for transform computed on the printed and handwritten words. estimating the skew angle [8], which can be performed for different angles and the largest magnitude variations correspond to the skew angle. 4) Smoothing: For smoothing binary document images, four filters [9] can be used to smooth the edges and removing the small pieces of noise. 5) Segmentation: Segmentation aims to extract the words from the document. Segmentation is performed in two consecutive steps: line segmentation and word segmentation. Both steps make use of the projection profiles [10]. B. Feature Generation Many kinds of features can be generated for distinguish the printed from handwritten text, Kuhnke et al. [1] proposed a straightness of vertically/horizontally oriented lines and symmetry relative to different points as features. Pal and Chaudhuri [2] used the distinctive structural and statistical features. Guo and Ma [3] evaluated their scheme using the vertical projection profile. Zheng et al. [4] used features like (a) (b) Gabor filter, Run length histogram features etc. Kandan et al. Figure 2. A shape (a) and its Radon transform (b). [5] used the invariant moments that are invariant under translation, scaling, rotation and reflection. We can easily see that the Radon transform generates The main idea of our approach is to take advantage of the more coefficients of the handwritten word comparatively to structural properties that help to discriminate printed from the printed word. handwritten text. More precisely, the shape of the printed
  • 3. 2) Feature vector generation We can see that the energy based-Radon transform To generate features of printed and handwritten words, generates more energy of the handwritten word we fix the angular direction number denoted by Nθ comparatively to the printed word. ( θ ∈]0,360°] ). Since, the Radon transform generates 3) Feature vector normalization redundant coefficients (Fig 2.b), hence, in our approach, we In many practical situations, a designer is confronted select the positive radial projections and taking all directions with features whose values lie within different dynamic from 0 to 360°. The feature vector is then generated by ranges. Thus, features with large values may have a larger computing for a given column in positive space of the Radon influence in the cost function than features with small values, transform, the sum of the square coefficient by setting the although this does not necessarily reflect their respective number of angular direction Nθ . The feature values E I (θ ) significance in the design of the classifier. The problem is are defined as: overcome by normalizing the features so that their values lie within similar ranges. This is achieved by using nonlinear transformation [13]. 1 E I (θ ) = ∑ N ρ TR ( ρ ,θ ) I 2 (2) Nρ C. Classification SVM are supervised learning methods, which have been Fig. 3 illustrates an example of feature generation values widely and successfully used for pattern recognition in which include the Radon transform energy for each angle θ . different applications as digit recognition [14]. The main concept of SVM lies to find a hyperplane that allows separating two classes, leaving the largest margin between the vectors of the two classes [14]. However, in real life, problems can be linearly non separable. To deal with this problem, a nonlinear decision surface is obtained by lifting the feature space into a higher dimensional space. A linear separating hyperplane is found in the higher dimensional space that gives a nonlinear decision surface in the original feature space. The decision function of the SVM can be expressed as follows: (a) f ( x) = ∑ α i yi K ( x, xi ) + b (3) i Where ( xi , yi ) ∈ ℜ d X{± 1} are the feature vectors and labels, respectively. In our case, the feature vectors and labels correspond to the Radon energy {xi } , printed words {+1} and handwritten words {-1}, respectively. Parameters α i and b are found by maximizing a quadratic function subject to some constraints [14]. K ( x, xi ) is the kernel function, which allows mapping the feature vectors into a (b) higher dimension inner product space. In our case, we use the RBF kernel (Radial Function Basis) since it offers better discrimination than other kernels. The RBF kernel is defined as: d ( x, xi ) K ( x, xi ) = exp(− ) (4) 2σ 2 2 d ( x, xi ) = x − xi (5) σ is user defined. (c) The optimization algorithm adopted for training SVMs is the Sequential Minimal Optimization (SMO) which provides Figure 3. Feature vector generation, (a) Printed word and its Radon transform, (b) handwritten word and its Radon transform, (b) Radon practical advantages [15]. transform, (c) Radon energy versus angle.
  • 4. III. EXPERIMENTAL RESULTS B. System validation In order to validate our system various experiments are A. Data set conducted for finding the SVM regularization parameter For evaluating the performances of the proposed method, (fixed at 10), kernel parameter ( σ ) and the best angular we use the IAM database (Institut für Informatik und direction number ( Nθ ). Fig. 5 shows the recognition rate angewandte Mathematik) [16]. They are scanned with resolution of 300 dpi, 8 bits/pixel, gray-scale and converted obtained on the validation set for each angular direction into binary images using the Wolf binarization method. This number. We can note that the RR is not very sensitive to the database is formed for more than 1500 documents containing number of the angular direction. However, the best printed and handwritten text. An example of a document can performances (RR=77.06%) are obtained for Nθ =20 and be seen in Fig. 4. Regions of printed and handwritten words σ =2.1. are easily separable. They present no auxiliary lines to fill or to supply with written texts. This characteristic facilitates the identification and classification of each type of words. For testing the performances of our system, 21 images are chosen and preprocessed. The set of words are divided into three subsets for training (1/3), validating (1/3) and testing (1/3), respectively. Table 1 summaries the data set. For each word, a vector with the energy based-Radon Transform is calculated. We use the recognition rate (RR) as a metric to evaluate the performances of our system, which is defined as: # of words correctly classified RR = (%) (7) Figure 5. Recognition rate using Radon transform # total of words for the system validation. In order to improve the recognition rate, we add by concatenation statistical features to the energy based-Radon transform, which are mean, variance, variance of projection profile (vertical and horizontal) and entropy. Fig. 6 shows the recognition rate versus the number of the angular direction. Figure 6. Recognition rate using Radon transform and statistical features. We can see that statistical feature sets are very suitable Figure 4. IAM Database form. information for the discrimination between machine printed and handwritten text since the RR has been improved to 92.8% for Nθ =10 and σ =2 using validation set. This TABLE I. DATA SET constitutes an additional advantage when adding the Data set Training Validation Testing statistical features. Machine printed 447 447 438 C. System testing Handwritten 525 525 484 After the validation of the system, the testing set is used Total 972 972 922 for evaluating its performances. Hence, the optimal values of
  • 5. the system validation are used for computing the recognition [6] B. Gatos, I. Pratikakis and S. J. Perantonis, “Adaptive degraded rate. The obtained results are 98.32%, which constitutes document image binarization,” Pattern Recognition, vol. 39, pp. 317- 327, 2006. encouraging performances compared to other works [1-5]. [7] C. Wolf, and J.M. Jolion, “Extraction and recognition of artificial text D. Comparaison with other similar works in multimedia documents,” Pattern Analysis and Applications, vol. 6, n. 4, pp. 309-326, 2003. We compare our results with some other published [8] T. Akiyama, and N. Hagita, “Automatic entry system for printed research works in terms of RR. Hence, Kuhnke et al. [1] documents,” Pattern Recognition, vol. 23, n. 11, pp. 1141-1154, 1990. proposed a neural network-based approach with straightness [9] M. Cheriet, N. Kharma, C. L. Liu, and C. Suen, “Character of vertically/horizontally oriented lines and symmetry Recognition Systems: A Guide for Students and Practitioners,” relative to different points as features. The system reached a Wiley-Interscience editor, p 321, 2007. RR of 78.5%. Pal and Chaudhuri [2] approach based on the [10] E. Ataer, and P. Duygulu, “Retrieval of Ottoman Documents,” Proc 8th ACM international workshop on Multimedia information distinctive structural and statistical features of machine retrieval, pp. 155-162, 2006. printed and handwritten text lines in Bangla script. The [11] S. Tabbone ,L. Wendling, and J. P. Salmon, “A new shape descriptor classification scheme has a RR of 98.3%. Guo and Ma [3] defined on the Radon transform,” Computer Vision and Image evaluated their scheme using the vertical projection profile of Understanding, vol.102, n. 1, pp. 42–51, 2006. the segmented word and obtained a 92.86% from their [12] S. R. Deans, “The Radon Transform and Some of Its Applications. New York: Wiley, 1983. scheme using HMM. Zheng et al. [4] got a RR of 96% using [13] S. Theodoridis, and K. Koutroumbas, “Pattern Recognition,” 4th Ed, SVM classifier and features like Gabor filter, Run length Elsevier Inc, 2009. histogram features etc. Kandan et al. [5] obtained a RR of [14] H. Nemmour, Y. Chibani, “Handwritten digit recognition based on a 93.22% using the invariant moments that are invariant under neural-SVM combination”, Int journal of computers and applications translation, scaling, rotation and reflection as features and (Acta Press Editor), vol. 32, n.1, pp. 104-109, 2010. SVM classifier. [15] H. Nemmour, Y. Chibani, “Integrating class-dependant tangent Our proposed method obtains a RR of 98.32% by using vectors into SVMs for handwritten digit recognition,” Int Conf on Signals, Circuits and Systems (ICSCS), pp. 1-4, 2009. Radon transform and statistical features and SVM classifier, [16] U.V. Marti, and H. Bunke, “The IAM-Database: an english sentence which constitutes encouraging performances compared to database for offline handwriting recognition,” International Journal other works. on Document Analysis and Recognition, vol. 5, n. 1, pp. 39-46, 2002. IV. CONCLUSION In this paper, we proposed a new method for discriminating printed and handwritten text in document images using the Radon transform and SVM classifiers. The system was implemented and tested in IAM databases. Our approach presents encouraging results by combining Radon energy and statistical features using SVM classifiers with the RBF kernel. In the future, we plane to implement our methodology to distinguish machine printed/handwritten with Arabic and Latin texts. REFERENCES [1] K. Kuhnke, L. Simoncini, and Z.M. Kovacs-V, “A System for Machine-Written and Hand-Written Character Distinction,” Proc. 3rd International Conference on Document Analysis and Recognition, vol. 2, pp 811-814, 1995. [2] U. Pal, and B. B. Chaudhuri, “Machine-printed and Hand-written Text Line Identification,” Pattern Recognition Letters, vol. 22, n. 3-4, pp. 431-441, 2001. [3] J. K. Guo, and M. Y. Ma, “Separating Handwritten Material from Machine Printed Text Using Hidden Markov Models,” Proc. 6th International Conference on Document Analysis and Recognition, pp. 439-443, 2001. [4] Y. Zheng, H. Li, and D. Doermann, “Machine Printed Text and Handwriting Identification in Noisy Document Images,” IEEE Trans on Pattern Analysis and Machine Intelligence, vol. 26, n. 3, pp. 337- 353, 2004. [5] R. Kandan, N. K. Reddy, K. R. Arvind, and A. G. Ramakrishnan, “A Robust Two Level Classification Algorithm for Text Localization in Documents,” Advances in Visual Computing, 3rd Int Symp, (ISVC 07), Part II, LNCS 4842, pp. 96–105, 2007.