1470                                 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,          VOL. 34,   N...
1472                                     IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,                  ...
Š                   i¼1           j¼1                                                                                     ...
Šand C à ¼ arg maxs;C fðX s ; CÞ, where j , j ¼ 1; . . . ; 6, are theweights to balance the effects of different models.  ...
are the confidence parameters. Both forms havethe bias problem. One straightforward strategy used in the insufficiencies: ...
Š                                                                                  pds ð!j jxÞ ¼      P                   ...
Š                                                                                                        i¼1              ...
=) in soft-max confidence, and the                             i¼1                   j¼1                          result i...
Šprimitive segment is variable; in this study, we also try
1474                             IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,            VOL. 34,   NO....
Upcoming SlideShare
Loading in …5

Handwritten chinese text recognition by integrating multiple contexts.bak


Published on

  • Be the first to comment

  • Be the first to like this

Handwritten chinese text recognition by integrating multiple contexts.bak

  1. 1. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012 1469 Handwritten Chinese Text Recognition by Integrating Multiple Contexts Qiu-Feng Wang, Fei Yin, and Cheng-Lin Liu, Senior Member, IEEE Abstract—This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature. Index Terms—Handwritten Chinese text recognition, confidence transformation, geometric models, language models, refined beam search, candidate character augmentation, maximum character accuracy training. Ç1 INTRODUCTIONH ANDWRITTEN Chinese character recognition has long HIT-MW with 853 pages containing 186,444 characters [9]. been considered a challenging problem. It has Two later works on the same data set, using characterattracted much attention since the 1970s and has achieved classifiers and statistical language models (SLM) based on http://ieeexploreprojects.blogspot.comtremendous advances [1], [2]. Both isolated character oversegmentation, reported a character-level correct raterecognition and character string recognition have been of 78.44 [10] and 73.97 percent [11], respectively. On thestudied intensively but are not solved yet. In isolated other hand, many works on online Japanese/ChineseChinese character recognition, most methods were eval- handwritten text recognition have reported higher accura-uated on data sets of constrained writing styles though cies [12], [13], [14], [15]. Online handwriting recognitionvery high accuracies (say, over 99 percent on Japanese has the advantage over offline recognition in that theKanji characters and over 98 percent on Chinese char- sequences of strokes are available for better segmentingacters) have been reported [1]. The accuracy on uncon- and discriminating characters.strained handwritten samples, however, is much lower Handwritten Chinese text recognition (HCTR) is a[3]. In Chinese character string recognition, most works challenging problem due to the large character set, theaimed at the recognition of text lines or phrases in rather diversity of writing styles, the character segmentationconstrained application domains, such as legal amount difficulty, and the unconstrained language domain. Fig. 1recognition in bank checks [4] and address phrase shows an example of a Chinese handwritten page. The largerecognition for postal mails [5], [6], [7], [8], where the set of Chinese characters (tens of thousands of classes) bringsnumber of character classes is very small or there are very difficulties to efficient and effective recognition. The diver-strong lexical constraints. Works on Chinese handwriting gence of writing styles among different writers and inrecognition of general texts have been reported only in different geographic areas aggravates the confusion betweenrecent years, and the reported accuracies are quite low. different classes. Handwritten text recognition is particularly difficult because the characters cannot be reliably segmentedFor example, Su et al. reported character-level correct rate prior to character recognition. The difficulties of character(CR) of 39.37 percent on a Chinese handwriting data set segmentation originate from the variability of character size and position, character touching and overlapping. A text line of Chinese handwriting must be recognized as a whole. The authors are with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun because it cannot be trivially segmented into words (there is East Road, Beijing 100190, P.R. China. no more extra space between words than between char- E-mail: {wangqf, fyin, liucl} acters). Last, handwritten text recognition is more difficultManuscript received 18 July 2011; revised 15 Nov. 2011; accepted 22 Nov. than bank check recognition and mail address reading2011; published online 20 Dec. 2011. because the lexical constraint is very weak: Under gramma-Recommended for acceptance by R. Manmatha. tical and semantic constraints, the number of sentence classesFor information on obtaining reprints of this article, please send e-mail, and reference IEEECS Log Number is infinite.TPAMI-2011-07-0476. Due to the large number of character classes and theDigital Object Identifier no. 10.1109/TPAMI.2011.264. infinite sentence classes of Chinese texts, HCTR can only be 0162-8828/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society
  2. 2. 1470 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012 to benefit the recognition accuracy. To balance the multiple contexts in path evaluation function, we optimize the combining weights on a data set of training text lines using a Maximum Character Accuracy (MCA) criterion. We evaluated the recognition performance on a large database CASIA-HWDB [21] of unconstrained Chinese handwritten characters and texts, and demonstrated superior perfor- mance by the proposed methods. The rest of this paper is organized as follows: Section 2 reviews some related works, Section 3 gives an overview of our HCTR system, Section 4 provides a statistical founda-Fig. 1. A page of handwritten Chinese text. tion of the path evaluation issue from the Bayesian decision view, Section 5 describes the confidence transformation,solved by segmentation-based approaches using character geometric context, and linguistic context in details, Section 6models [16], preferably by explicit segmentation, also called introduces the refined beam search algorithm and candi-oversegmentation, which can take advantage of the char- date character augmentation strategy, Section 7 presents theacter shape and overlapping and touching characteristics to weights learning method, Section 8 reports the experimen-better separate the characters at their boundaries. The result tal results, and Section 9 draws concluding remarks.of oversegmentation is a sequence of primitive segments,each corresponding to a character or a part of a character,such that candidate characters can be generated by 2 RELATED WORKSconcatenating consecutive segments [5]. The candidate In the context of handwritten text (character string1)character sequences can be represented in a network called recognition, many works have contributed to the relateda candidate lattice [17], and each candidate segmentation issues of oversegmentation, character classification, con-path in the lattice can be split into many segmentation- fidence transformation, language model, geometric model,recognition paths by assigning character classes to the path evaluation and search, and parameter estimation.candidate characters. The result of character segmentation For oversegmentation, connected component analysisand recognition is obtained by evaluating the paths in the has been widely adopted, but the splitting of connectedlattice and searching for the optimal path. (touching) characters has been a concern [5], [22], [23]. After In integrated segmentation-and-recognition, the candi- character patterns by combiningdate segmentation-recognition paths are usually evaluated generating candidate consecutive primitive segments, each candidate pattern isby combining the character recognition scores (classifier classified using a classifier to assign similarity/dissimilarityoutputs), geometric context, and linguistic context [16]. scores to some character classes. Character classificationMany efforts have been made this direction, but there has involves character normalization, feature extraction, andnot been a satisfactory solution. The existing methods eitherintegrated incomplete contexts [9], [10], [18] or combined the classifier design. The state-of-the-art methods have beencontexts heuristically without optimizing the combining reviewed in [24], [25]. For classification of Chinese char-weights [12], [13], [19], [20]. Zhou et al. optimize the acters with large number of classes, the most popularly usedcombining weights using the conditional random field classifiers are the modified quadratic discriminant function(CRF) model [14], which is hard to incorporate into language (MQDF) [26] and the nearest prototype classifier (NPC) [27].models of higher order than the bi-gram. Zhu et al. optimize The MQDF provides higher accuracy than the NPC butthe combining weights using the genetic algorithm (GA) [15], suffers from high expenses of storage and computation.which is computationally expensive and is sensitive to some Transforming the similarity/dissimilarity measures out-artificial parameters. The previous works have addressed put by classifiers to probabilistic confidence measures canhandwritten text (character string) recognition from different benefit from fusing multiple classifiers or fusing multipleviewpoints and have contributed various techniques. How- patterns, as has been demonstrated in previous works (e.g.,ever, none has investigated these techniques comprehen- [28], [29]). In character string recognition, Jiang et al. [18]sively and integrated them in a high-performance system for transformed classifier outputs to confidence values underChinese/Japanese handwritten text recognition. the soft-max framework. Li et al. [30] used the logistic In this study, we investigate three key issues of integrated regression model for confidence transformation. Our recentsegmentation-and-recognition for HCTR: candidate path work [31] compared various confidence transformationevaluation, path search, and parameter estimation. By methods in HCTR and found a better solution.elaborating the techniques for these issues, we achieved Language models are widely used in speech recognition,significant improvements on unconstrained handwritten machine translation, handwriting recognition, and so onChinese texts. In path evaluation, we integrate character [32]. The most popular language model is the n-gram, whichrecognition scores, geometric context, and linguistic context characterizes the statistical dependency between charactersfrom the Bayesian decision view, and convert the classifier or words. Character-level n-gram models have beenoutputs to posterior probabilities via confidence transforma- popularly used in character string recognition (e.g., [12],tion (CT). In path search, a refined beam search algorithm is 1. We will use the terms text line recognition and string recognitionused to improve the search efficiency and, meanwhile, a interchangeably because, in this study, a text line is treated as a charactercandidate character augmentation (CCA) strategy is applied string.
  3. 3. WANG ET AL.: HANDWRITTEN CHINESE TEXT RECOGNITION BY INTEGRATING MULTIPLE CONTEXTS 1471[13], [14], [15], [18], [19], [20]. Word-level and hybridlanguage models were used in postprocessing for correctingrecognition errors after character segmentation [30], [33], buthave been rarely used in integrated segmentation-and-recognition [10]. In addition to the character recognition scores andlinguistic context, the geometric context also plays animportant role in character string recognition, particularlyfor disambiguating character segmentation [12], [13], [14],[15], [19], [20], [34]. Zhou et al. elaborated the geometriccontext models into unary and binary, character class-dependent and class-independent models in online hand-writing recognition [13], [14]. Yin et al. elaborated thegeometric context models for offline handwriting andapplied to transcript mapping of handwritten Chinesedocuments [35]. A key issue in character string recognition is to design an Fig. 2. System diagram of handwritten Chinese text line recognition.objective function evaluating each candidate segmentation-recognition path. The path evaluation function is hoped to be further complicated when using word-level language modelsinsensitive to the path length (number of characters on the because the word segmentation is again a combinatorialpath). The summation of classifier output similarity/dissim- problem [10]. The speech recognition community hasilarity scores or product of class probabilities is not contributed many efficient search algorithms based onappropriate since this is biased to short paths. Normalizing dynamic programming and some variants (e.g., beam search)the summation or product by the path length overcomes the [44]. The beam search strategy provides a good tradeoffbias problem [36], [37], but this normalized form does not between efficiency and accuracy. The character-synchronousenable optimal path search by dynamic programming (DP). beam search strategy is appropriate for lexicon-driven string recognition [5], while the frame-synchronous (also called asBeam search can be used instead, but does not guarantee time-synchronous in speech recognition) strategy is appro-optimality [37]. Another way to overcome the path length priate for lexicon-free string recognition [37].bias is to add a compensative constant in the summated path http://ieeexploreprojects.blogspot.comrecognition, the pruning or augmen-evaluation function [20], but the constant needs to be In character string tation of character classes affects the search efficiency andestimated empirically. Wuthrich et al. [38] called this accuracy. Ideally, a candidate character pattern is assignedconstant a word insertion penalty, and Quiniou et al. [39] as few classes as possible by the classifier, including the truealso used this constant to control the deletion and insertion of class. For Chinese handwriting, it often entails a largewords. Another effective way is to weight the character number (e.g., several hundred) of candidate classes toclassification score with the number of primitive segments guarantee a high probability of including the true class,forming the character pattern [10], [15], motivated by the however. This complicates the search space on one handvariable duration HMM of Chen et al. [40]. This not only and, on the other hand, may deteriorate the recognitionmakes the number of summated terms in the path evaluation accuracy because there are too many wrong classesfunction equal the number of primitive segments (and thus competing with the true class. Therefore, some works haveindependent of the path length), but also preserves the attempted to reduce the candidate classes output by thesummation form and enables optimal path search by DP. classifier by confidence evaluation [45], [46], and some other In weighted combination of context models for path works attempted to supplement candidate classes forevaluation, the weights were sometimes determined by trial reducing the probability of missing the true class, accordingand error. Some works have applied the supervised to the linguistic context [47] or the classification confusionlearning approach to estimate the weights by optimizing a matrix [33], [48]. These techniques, however, have not beenstring recognition criterion. Recently, Zhou et al. [14] evaluated in integrated segmentation-and-recognition.proposed learning the weights by minimizing the negativelog-likelihood (NLL) loss under the framework of condi-tional random field, and compared its performance with the 3 SYSTEM OVERVIEWminimum classification error (MCE) criterion [41]. Yin et al. This study focuses on the recognition of text lines, which are[35] optimized the weights by MCE learning for transcript assumed to have been segmented externally. For themapping. Zhu et al. [15] optimized the combining weights convenience of academic research and benchmarking, thefor handwriting recognition using the genetic algorithm. text lines in our database have been segmented andMore discriminative learning criteria have been proposed annotated at character level [49].by the speech recognition community, such as minimum Fig. 2 shows the block diagram of our system for text linephone error (MPE) and its variant, minimum word error recognition. First, the input text line image is oversegmen-(MWE) [42], [43]. ted into a sequence of primitive segments (Fig. 3a) using the The search of optimal path in Chinese character string connected component-based method [5]. Consecutive pri-recognition is not trivial because of the large number of mitive segments are combined to generate candidatecandidate segmentation-recognition paths. The search is character patterns, forming a segmentation candidate lattice
  4. 4. 1472 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012 where m is the number of segmented candidate patterns (i.e., character number) of the segmentation path, zp ¼ 1 i means that the ith candidate pattern is a valid character, and zg ¼ 1 means that the gap between the ði À 1Þth and i ith candidate patterns is a valid between-character gap; the terms gui and gbi are the class-independent geometric i i features extracted from the ith candidate pattern and from the pair of the ði À 1Þth and ith candidate patterns, respectively. The two probabilistic terms in (2) correspond to the unary and binary class-independent geometric model (see Section 5.2), respectively. To avoid summing over a huge number of segmentation paths in (1), the optimalFig. 3. (a) Oversegmentation to a sequence of primitive segments (each string class can be decided approximately byis bounded by a small box), (b) segmentation candidate lattice of thearrowed part of (a), (c) character candidate lattice of the thick path in (b), C Ã ¼ arg max P ðsjXÞP ðCjXs Þ: ð3Þ(d) word candidate lattice of (c). s;C This is to search for the optimal segmentation and string(Fig. 3b). After that, each candidate pattern is classified to class simultaneously in the lattice.assign a number of candidate character classes, and all thecandidate patterns in a candidate segmentation path 4.1 Posterior Probability of String Classgenerate a character candidate lattice (Fig. 3c). If a word- The posterior probability of string class can be obtained bylevel language model is used, each sequence of candidate the Bayes’ formula:characters is matched with a word lexicon to segment intocandidate words, forming a word candidate lattice (Fig. 3d). pðXs jCÞpðCÞ P ðCjXs Þ ¼ ; ð4ÞAll of these character (or word) candidate lattices are pðX s Þmerged to construct the segmentation-recognition lattice of where the prior probability pðCÞ ¼ pðc1 . . . cm Þ is given by atext line image. Each path in this lattice is constructed by a statistical language model (Section 5.3). The likelihoodcharacter sequence paired with a candidate pattern se- function pðXs jCÞ can be decomposed byquence, and this path is called a candidate segmentation- http://ieeexploreprojects.blogspot.comrecognition path. Finally, the task of string recognition is to Ym À Á À Á pðXs jCÞ ¼ pðxi jci Þp guc jci p gbc jciÀ1 ci ; i i ð5Þfind the optimal path in this segmentation-recognition i¼1lattice. Considering that the text lines are segmented from where we assume that the feature vectors (xi ; guc ; gbc ) are i itext pages, we utilize the linguistic dependency between only dependent on the corresponding character patterns,consecutive lines to improve the recognition accuracy by and they represent the intrinsic shape feature for characterconcatenating multiple top-rank recognition results of theprevious line to the current line for recognition. recognition, unary and binary outline geometric features for geometric models (Section 5.2), respectively. Similarly, we can get pðXs Þ as4 STATISTICAL FOUNDATION Y m À Á À ÁWe formulate the problem of handwritten Chinese text pðXs Þ ¼ pðxi Þp guc p gbc : i i ð6Þrecognition from Bayesian decision view. According to i¼1Bayesian decision under the 0/1 loss, maximizing a posterior Consequently, the posterior probability P ðCjXs Þ isprobability of character sequence (string class) C ¼ obtained by<c1 . . . cm > given a text line image X is the optimal criterion À Á À Áfor recognition. This posterior probability is formulated by Y pðxi jci Þ p guc jci p gbc jciÀ1 ci m i i s À Á À Á ; P ðCjX Þ ¼ pðCÞ ð7Þ X X pðxi Þ p guc i p gbci i¼1 P ðCjXÞ ¼ P ðC; sjXÞ ¼ P ðsjXÞP ðCjs; XÞ s s and transformed to X ð1Þ ¼ P ðsjXÞP ðCjXs Þ; À Á À Á s Y pðci jxi Þ p ci jguc p ciÀ1 ci jgbc m i i P ðCjXs Þ ¼ pðCÞ ; ð8Þwhere s is the segmentation path index, P ðsjXÞ denotes the i¼1 p1 ðci Þ p2 ðci Þ p3 ðciÀ1 ci Þposterior probability of the sth segmentation path given the where the three posterior probabilities can be approximatedtext line image, and P ðCjXs Þ represents the posterior by confidence transformation (see Section 5.1), and threeprobability of string class given the sth segmentation path. corresponding prior probabilities p1 ðci Þ; p2 ðci Þ; p3 ðciÀ1 ci Þ areP ðsjXÞ is formulated by viewed as constants in classifier design (the uneven Y À p m Á À Á distribution pðci Þ is considered elsewhere in the language P ðsjXÞ ¼ p zi ¼ 1jgui p zg ¼ 1jgbi ; i i i ð2Þ model pðCÞ), denoted by p1 ; p2 ; p3 , respectively. Substituting i¼1 Q pðCÞ ¼ m pðci jhi Þ (see Section 5.3) in (8) gives i¼1
  5. 5. WANG ET AL.: HANDWRITTEN CHINESE TEXT RECOGNITION BY INTEGRATING MULTIPLE CONTEXTS 1473 À Á À Á Y pðci jxi Þp ci jguc p ciÀ1 ci jgbc pðci jhi Þ m s P ðCjX Þ ¼ i i ; ð9Þ another form of evaluation function by replacing the i¼1 P segments number with the width of candidate pattern !where P ¼ p1 p2 p3 . The four probabilistic terms in (9) X m X 5 s 0 jcorrespond to the character recognition model, unary and fðX ; CÞ ¼ wi Á lpi þ j Á lpi ; ð14Þ i¼1 j¼1binary class-dependent geometric model, and languagemodel, respectively. where wi is the width of the ith character pattern after normalizing by the estimated height of the text line. The four4.2 Path Evaluation Function path evaluation functions in the above (adding WordCombining the posterior probabilities of segmentation path Insertion Penalty (11), Normalization with Path Length(2) and string class (9), the optimal string class of (3) can be (12), Weighting with primitive Segments Number (13), andobtained by Weighting with Character pattern Width (14)) are abbre- viated as WIP, NPL, WSN, and WCW, respectively. 1 YÂm À Á À Á C à ¼ arg max m pðci jxi Þp ci jguc p ciÀ1 ci jgbc i i s;C P ð10Þ i¼1 À p Á À Á à 5 PROBABILISTIC MODELS OF CONTEXTS p zi ¼ 1jgui p zg ¼ 1jgbi pðci jhi Þ : i i i The path evaluation functions (11)-(14) entail the estimationNote that all the terms m, ci , xi , guc , gbc , gui , gbi , zp , zg , hi are of the context models and the combining weights. We i i i i i irelated to the sth segmentation path, and the index s is describe the context models in the following, while thedropped for simplification. However, the probability for- estimation of weights is addressed in Section 7.mulation (10) is still insufficient, because it does notconsider the different contribution and reliability of differ- 5.1 Confidence Transformationent models (character recognition, geometric, and language The character recognition score, ideally, the posteriormodels). In the following, we take the logarithm of probability pð!jxÞ (! refers to a class and x is the featureprobability (denoted by lp0 ¼ log pðci jxi Þ, lp1 ¼ log pðci jguc Þ, vector), is an important context for string recognition. Most i i ilp2 ¼ log pðciÀ1 ci jgbc Þ, lp3 ¼ log pðzp ¼ 1jgui Þ, lp4 ¼ log pðzg ¼ classifiers, however, do not output class posterior prob- i i i i i i i abilities. We hence resort to confidence transformation1jgbi Þ, lp5 ¼ log pðci jhi Þ, lp6 ¼ log P ) and incorporate the i i 1 methods for converting classifier outputs to posteriorweights of different models to get a generalized likelihood s probabilities [31].function fðX ; CÞ for the segmentation-recognition pathevaluation: http://ieeexploreprojects.blogspot.comare functions for probabilistic confi- Two commonly used dence transformation the sigmoidal function (15) and ! X m X 5 the soft-max function (16): fðXs ; CÞ ¼ lp0 þ i j Á lpj þ 6 Á m Á lp6 ; i ð11Þ exp½Àdj ðxÞ þ
  6. 6. Š i¼1 j¼1 P sg ð!j jxÞ ¼ ; j ¼ 1; . . . ; M; ð15Þ 1 þ exp½Àdj ðxÞ þ
  7. 7. Šand C à ¼ arg maxs;C fðX s ; CÞ, where j , j ¼ 1; . . . ; 6, are theweights to balance the effects of different models. exp½Àdj ðxފ In the above, the positive constant lp6 is also called the P sf ð!j jxÞ ¼ PM ; j ¼ 1; . . . ; M: ð16Þword insertion penalty in [38], and used to overcome the i¼1 exp½Àdi ðxފbias to short strings (without this term, the path evaluation In the above, M is the total number of defined classes, dj ðxÞ isscore decreases as the path length m increases). Besides this the dissimilarity score of class !j output by the classifier, formulation, there are some heuristic methods to deal with and
  8. 8. are the confidence parameters. Both forms havethe bias problem. One straightforward strategy used in the insufficiencies: The sigmoidal form gives multiple one-previous works [13], [36], [37] is to normalize the evaluation versus-all two-class probabilities instead of multiclass prob-function with the path length: abilities, while the soft-max form forces the sum of posterior ! probabilities to one even on noncharacter (outlier) patterns. s 1Xm 0 X 5 j For the sigmoidal form, we combine such two-class fðX ; CÞ ¼ lpi þ j Á lpi : ð12Þ m i¼1 j¼1 probabilities into multiclass probabilities according to the Dempster-Shafer (D-S) theory of evidence [50], and theIn our previous works [10], [31], we weighted the lp0 with probabilities can be formulated by [31] ithe number of constituent primitive segments ki (similar tothe variable length HMM of [40]), and got the evaluation exp½Àdj ðxÞ þ
  9. 9. Š pds ð!j jxÞ ¼ P ; j ¼ 1; . . . ; M: ð17Þfunction 1 þ M exp½Àdi ðxÞ þ
  10. 10. Š i¼1 ! X m X 5 We also introduce an outlier class dissimilarity score j fðXs ; CÞ ¼ ki Á lp0 þ i j Á lpi : ð13Þ (assuming do ðxÞ ¼
  11. 11. =) in soft-max confidence, and the i¼1 j¼1 result is extended to the same form of (17) [31]. After gettingThis function is not sensitive to the path length because the multiclass probabilities, the probability of outlier class issum of ki is a constant (equals the total number of primitive 1segments in the text line). Consider that the width of each pds ð!outlier jxÞ ¼ PM ; ð18Þ 1þ i¼1 exp½Àdi ðxÞ þ
  12. 12. Šprimitive segment is variable; in this study, we also try
  13. 13. 1474 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012 Y mwhich is the complement probability to the M defined pcti ðCÞ ¼ pðci jciÀ2 ciÀ1 Þ: ð22Þclasses. i¼1 The confidence parameters are optimized by minimizing Compared to the character-level, word-level models canthe cross entropy (CE) loss function on a validation data set better explore the syntactic and semantic meaning. Seg-(preferably different from the data set for training classi- menting the character sequence C into word sequencefiers) [31]. C ¼ w1 w2 Á Á Á wL , the word bigram model is5.2 Geometric Models YLConsidering that Chinese texts mix with alphanumeric pwbi ðCÞ ¼ pðwi jwiÀ1 Þ: ð23Þcharacters and punctuation marks and different characters i¼1show distinct outline features (e.g., size, position, aspect Due to the large size of the word lexicon (about 0.3 millionratio, and within-character gap), we design two class- words), we only use the word bigram. Further, we clusterdependent geometric models, namely, single-character the words into a number of word classes by the exchangegeometry (unary geometric model) and between-character algorithm [52], and the word class bigram is calculated bygeometry (binary geometric model), respectively. In addi-tion, two class-independent geometric models are designed YLto indicate whether a candidate pattern is a valid character pwcb ðCÞ ¼ pðwi jW i ÞpðW i jW iÀ1 Þ; ð24Þor not, and whether a gap is a between-character gap or not, i¼1respectively. The four geometric models (unary and binary where the term Wi is the class of word wi , and the clusterclass-dependent, unary and binary class-independent) are number is set empirically to 1,000 [10]. In addition, theabbreviated as “ucg,” “bcg,” “uig,” and “big,” respectively, word class bigram is often used by interpolating with theand have been used successfully in transcript mapping of word bigram [32]:handwritten Chinese documents [35]. To build geometric models, we extract features for unary log piwc ðCÞ ¼ log pwbi ðCÞ þ Á log pwcb ðCÞ; ð25Þand binary geometry from the bounding boxes and profiles of where the logarithm is used for more general purposes, anda candidate character pattern, and of two consecutive this model is called interpolating word and class bigram.character patterns, respectively [35]. Since the number of We use the SRI Language Model (SRILM) toolkit [53] toChinese characters is very large and many different char- give the parameters of n-gram models. By the toolkit,acters have similar geometric features, we cluster the the default smoothing technique (Katz smoothing) and thecharacter classes empirically into six are used. The thresholds of the superclasses using the entropy-based pruningEM algorithm. After clustering, we use a 6-class quadratic pruning for character bigram, character trigram and worddiscriminant function (QDF) for the “ucg” model, and a bigram are set empirically as 5 Â 10À8 , 10À7 , and 10À7 ,36-class QDF for the “bcg” model. In addition, we use a linear respectively [10]. Since the word class number (1,000) leadssupport vector machine (SVM) trained with character and to a moderate model size, the parameters are not pruned.noncharacter samples for the “uig” model, and similarly, alinear SVM for the “big” model. In path evaluation function,we convert both QDF and SVM outputs to posterior 6 PATH SEARCHprobabilities via sigmoidal confidence transformation. On defining a score for each path in the segmentation- recognition lattice, the next issue is how to efficiently find5.3 Statistical Language Models the path of maximum score. In addition, to alleviate the lossIn character string recognition, the statistical language that the candidate classes assigned by character classifier domodel is used to give the prior probability of a certain not contain the true class, we propose an augmentationcharacter sequence [51]. If the sequence C contains technique to supplement candidate classes in the lattice.m characters, pðCÞ can be decomposed by 6.1 Search Algorithm Y À m Á Y m If the segmentation-recognition path is evaluated by the pðCÞ ¼ p ci jciÀ1 ¼ 1 pðci jhi Þ; ð19Þ accumulated score (WIP, WSN, and WCW), it satisfies the i¼1 i¼1 principle of optimality, and the optimal path with max-where hi ¼ ciÀ1 ¼ c1 Á Á Á ciÀ1 denotes the history of imum score can be found by dynamic programming. 1character ci (h1 is null). An n-gram model only considers Nevertheless, when binary or higher order contexts arethe n À 1 history characters in (19): used, the complexity of DP search is high. For the NPL Y À m function, which does not satisfy the principle of optimality, Á pðCÞ ¼ p ci jciÀ1 ; iÀnþ1 ð20Þ DP search does not guarantee finding the optimal path, and i¼1 the beam search strategy can better find an approximately optimal solution. In beam search, it is critical to retain thewhere n is called the order of the model. For high complexity, correct partial path in fewer survived paths. A simplethe character bigram and trigram are usually used: strategy of beam search is to retain the multiple top-rank Ym partial paths ending at each primitive segment [16]. This pcbi ðCÞ ¼ pðci jciÀ1 Þ; ð21Þ simple strategy, though it works efficiently, is too rough, i¼1 particularly when high-order context models are used. A
  14. 14. WANG ET AL.: HANDWRITTEN CHINESE TEXT RECOGNITION BY INTEGRATING MULTIPLE CONTEXTS 1475 Fig. 5. Diagram of candidate character augmentation. 4. Sort the retained nodes in above in decreasing order according to AS over (k; ci;k ), and the leading BW (Beam Width) nodes are retained and added to LIST, while the others are pruned to accelerate search. 5. Set i ¼ i þ 1, back to Step 2 and iterate until the last primitive segment is reached (such nodes called terminal nodes). 6. Backtrack the terminal node in LIST of maximum score along the element P N, and obtain the result character string. We can see that if BW ¼ K Â CN, the above algorithm guarantees finding the optimal path for context models upFig. 4. An illustrative example of refined beam search (K ¼ 3, CN ¼ 2, to order 2 when the principle of optimality is satisfied, i.e., itBW ¼ 3) at a primitive segment. (a) A sequence of consecutive primitive is equivalent to DP. For context models of order 3 (e.g.,segments (the upward arrow points to current primitive segment and theleftward arrow points to the direction of segments combination to character trigram) or higher, it does not guarantee findinggenerate candidate patterns), (b) search space expansion at the pointed the optimal path but significantly accelerates searchprimitive segment of (a) (the pruned nodes are labeled). compared to DP. Further, if BW K Â CN, the search procedure is further accelerated. Compared to simple beamrefined beam search algorithm was presented in our search, the two-step pruning strategy in the refined beamprevious work (called pruned DP there) [10], which is search algorithm has at least two advantages: 1) The firstsuitable for using high-order context models. step pruning (in Step 3) observes the principle of optimality; After oversegmentation, the text line image is represented 2) sorting the nodes has lower complexity. n-grams, the search process works onas a sequence of primitive segments. A candidate pattern If we use word-level a word candidate lattice, which is constructed from charactercomposed of k consecutive segments and ending at the lattice by combining several consecutive characters accord-ith segment is denoted by ði; kÞ. A node in the search space is ing to the word lexicon. So, search in the word candidaterepresented as a quadruple SN ¼ fCP ; CC; AS; P Ng, where lattice is very complex [10]. To accelerate this search process,SN denotes a search node, CP is a candidate pattern, CC is a we first prune the original character lattice using the abovecandidate character of CP , and AS is the accumulated score character search process (many nodes are pruned in Steps 3from the root node (calculated by (11)-(14), where m is the and 4), then use it to construct a succinct word lattice.length of the current partial path), and P N is a pointer to theparent node of SN. All nodes are stored in a list named LIST 6.2 Candidate Character Augmentationto backtrack the final path. The refined beam search process The character classifier assigns a number of candidate classesis described in detail as follows, and Fig. 4 gives an to each candidate pattern with the risk of missing the trueillustrative example. class. In Chinese handwriting recognition, even assigning Refined Beam Search in frame-synchronous fashion: hundreds of classes cannot guarantee 100 percent inclusion of the true class. Therefore, we propose a Candidate 1. Initialize the first search node (i.e., the root) of LIST, Character Augmentation (CCA) method, as diagramed in SN0 ¼ fnull; null; 0; nullg, set i ¼ 1. Fig. 5, to supplement candidate classes during search. 2. Generate nodes of CP ¼ ði; kÞ over k (the second The CCA method exploits both confusion information of level nodes in Fig. 4b, i À k ! 0, k K, K is the the classifier and linguistic context. First, a candidate pattern maximum number of segments to be concatenated). xi is classified to assign a number of candidate classes, called For each CP , the top CN (Candidate Number) the Original Candidate Set (OCS). Then, the confusion candidate characters are assigned by the character information and the linguistic context are used to supplement classifier (the third level nodes in Fig. 4b). In total, at two types of candidate classes, forming the Augmented most K Â CN nodes are generated. Candidate Set (ACS). Last, the Extended Candidate Set (ECS), 3. Link to parent nodes for current nodes (CP ¼ ði; kÞ, as the union of the OCS and the ACS, is used to generate CC ¼ ci;k ). For multiple such parent nodes (CP 0 ¼ candidate nodes at Step 2 of the search process. ði À k; k0 Þ, CC 0 ¼ ciÀk;k0 ), the current node generates To predict the true class from the OCS by confusion information, we calculate the probability of the hypothe- multiple copies, each linked to a respective parent sized truth class !t given an output class !o : node (P N) and associated to an accumulated score (AS) (the fourth level nodes in Fig. 4b). In these pð!o j!t Þpð!t Þ copies, only the node with maximum AS over pð!t j!o Þ ¼ P ; ð26Þ 0 !t pð!o j!t Þpð!t Þ (k ; ciÀk;k 0 ) is retained (the fifth level nodes in Fig. 4b).
  15. 15. 1476 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012 Pwhere pð!o j!t Þ ¼ nt;o = !o nt;o , nt;o is the number of times We optimize the MCA object (29) by stochastic gradientthat characters of class !t are classified as !o , counted on a ascent method. However, the gradients are difficult tovalidation data set, and the prior probability pð!t Þ is usually calculate precisely due to the huge number Nr ; moreover, jregarded as equal for all classes. According to (26), we select the precise calculation of AðCr ; T r Þ needs a completed path.several top most likely truth classes for each output class !o Therefore, we only consider the top N paths of maximumin OCS, and all these likely truth classes form the first ACS. evaluation score while viewing the probabilities of the For augmenting candidate characters from the linguistic remaining paths as zero.context, we use three strategies: forward character bigram,backward character bigram, and bigram cache model. Inforward character bigram, we predict the character with the 8 EXPERIMENTAL RESULTSmaximum probability: We evaluated the performance of our approach on a large database of unconstrained Chinese handwriting, CASIA- cà ¼ arg max pðci jciÀ1 Þ; i ð27Þ HWDB [21], and on a small data set, HIT-MW [54]. ciwhere ciÀ1 is an immediately preceding character in the 8.1 Database and Experimental Settingsearch space, while Kigo [47] used all characters whose The CASIA-HWDB database contains both isolated char-preceding character is ciÀ1 in his smaller bigram table. acters and unconstrained handwritten texts, and is dividedSimilarly, we can predict characters by the backward into a training set of 816 writers and a test set of 204 writers.character bigram: The training set contains 3,118,477 isolated character samples à of 7,356 classes (7,185 Chinese characters, 109 frequently used ci ¼ arg max pðci jciþ1 Þ; ð28Þ symbols, 10 digits, and 52 English letters) and 4,076 pages of ci handwritten texts. The text pages have a few miswrittenwhere pðci jciþ1 Þ ¼ pðciþ1 jci Þpðci Þ=pðciþ1 Þ and the character characters and characters beyond the 7,356 classes, whichciþ1 is from the immediately succeeding OCS. For predic- we call noncharacters and outlier characters, respectively.tion from the cache model, we assume that a document The characters in the training text pages (except for thecovers a single topic, such that character sequence is likely noncharacters and outlier characters, 1,080,017 samples)to repeat in the document. Accordingly, the candidate were also segmented and used together with the isolatedcharacter is predicted by both forward and backward samples for training the character classifier. We evaluated thebigram: fci : ðciÀ1 ; ci Þ 2 cache or ðci ; ciþ1 Þ 2 cacheg, where text line recognition performance on the 1,015 handwrittenthe cache is the history text (the best result string until the pages of 204 test writers, which were segmented intocurrent character pattern of the document). 10,449 text lines containing 268,629 characters (including 723 noncharacters and 368 outlier characters).7 MAXIMUM CHARACTER ACCURACY TRAINING To compare our results with those reported in the literature [9], [10], [11], we also tested on the data set HIT-Since the parameters of multicontext models in path MW [54], from which a test set of 383 text lines containsevaluation function are estimated in advance, the object of 8,448 characters (7,405 Chinese characters, 780 symbols,training is to tune the combining weights to optimize the 230 digits, eight English letters, 16 noncharacters, and ninerecognition performance. To do this, we optimize a outlier characters).Maximum Character Accuracy (MCA) criterion similar to To build the character classifier, we extract features fromthe Minimum Word Error (MWE) [42] in speech recognition. gray-scale character images (background eliminated) usingMCA is a smoothed approximation to the accuracy of the R the normalization-cooperated gradient feature (NCGF)string samples (text line images) in the training data set: method [55]. Before feature extraction, the gray levels of 1 XX À R Nr Á À j Á foreground pixels in each image are normalized to a standard j max ÉðÃÞ ¼ P à Cr jX r A Cr ; T r ; ð29Þ mean and deviation. The 512D feature vector obtained is R r¼1 j¼1 reduced to 160D by Fisher linear discriminant analysiswhere Nr is the number of all segmentation-recognition (FLDA), and then input into a modified quadratic discrimi-paths in the lattice of the rth text line image Xr , and Cr is nant function classifier. The classifier parameters were jthe character sequence of the jth path. The term AðCr ; T r Þ is learned on 4/5 samples of the training set, and the remaining jthe character accuracy score, which equals the number of 1/5 samples were used for confidence parameter estimationcharacters in the ground-truth transcript Tr minus the and confusion matrix construction. For parameter estimationnumber of errors in Cr (including substitution, insertion, of the geometric models, we extracted geometric features jand deletion errors, see Section 8.2). Note that the posterior from 41,781 text lines of training text pages. The statistical jprobability P à ðCr jXr Þ can be computed by language models were trained on a text corpus containing about 50 million characters (about 32 million words) [10]. On  À j j Áà À j Á exp f à Xr ; Cr obtaining the context models, the combining weights of path P à Cr jXr ¼ PN r  À Áà ; ð30Þ evaluation function were learned on 300 training text pages. i i i¼1 exp f à Xr ; Cr Table 1 shows some statistics of character samples j jwhere is a scaled constant value, and f à ðXr ; Cr Þ can be any segmented from the test text pages of CASIA-HWDB. Thepath evaluation function in (11)-(14) under the weights set Ã. “number” row gives the numbers of different types ofMCA is degenerated to MCE [41] if the character accuracy characters (including noncharacters and outlier characters). j jscore is calculated by AðCr ; T r Þ ¼ ðCr ; T r Þ 2 f0; 1g [43]. We can see that the majority of segmented characters are
  16. 16. WANG ET AL.: HANDWRITTEN CHINESE TEXT RECOGNITION BY INTEGRATING MULTIPLE CONTEXTS 1477 TABLE 1 TABLE 2 Statistics of Character Types, Recognition, Recognition Results of Different Path Evaluation Functions and Segmentation Correct Rates on the Test Set there) is a good metric for tasks of content modeling (e.g.,Chinese characters, and the number of symbols (mostly document retrieval). For analyzing the performance onpunctuation marks) is appreciable. Some samples of non- different types of characters, we also give the CR for fourcharacters and outlier characters are shown in Figs. 6a and 6b, types: Chinese characters (ch), symbols (sb), digits (dg),respectively. The “rec” row gives the correct rate of the and letters (lt).segmented character recognition by the character classifier, 8.3 Text Line Recognition Resultsand “rec20” and “rec200” are the cumulative accuracies of top20 and 200 ranks, respectively. We can see that the correct rate We evaluated the effects of different techniques. First, weof Chinese character is highest (87.28 percent). The overall compared the effects of different path evaluation functions.correct rate, 83.78 percent, is lower because of the low correct Second, the effects of different confidence transformationrates of symbols, digits, and letters. The noncharacters and methods, combinations of geometric models and languageoutlier characters cannot be recognized by the classifier, models were evaluated. Last, we show the results ofwhich covers a defined character set of 7,356 classes. different numbers of candidate character classes, beam Table 1 (the last row, “seg”) also shows the accuracy of widths, and candidate character augmentation methods inoversegmentation (a character is correctly oversegmented path search. We report the recognition rates of differentwhen it is separated from other characters despite the within- techniques on the CASIA-HWDB test set, and give thecharacter splits). We observe that 4.46 percent of characters processing time on all test pages (1,015 pages) consumed onwere not correctly separated (i.e., they are undersegmented a desktop computer of 2.66 GHz CPU, programming usingand cannot be correctly segmented and recognized by the Microsoft Visual C++. With several selected combinations ofsubsequent character string recognition). This implies that techniques, we also report results on the HIT-MW test set.the oversegmentation of characters is still a challenge. Someexamples of such errors are shown in Fig. 6c. 8.3.1 Comparing Path Evaluation Functions In evaluating the effects of path evaluation functions and CT8.2 Performance Metrics methods, the character trigram language model and allWe evaluate the recognition performance of text lines using geometric models were used. The search algorithm was thetwo character-level accuracy metrics following [9]: Correct refined beam search with K ¼ 4, CN ¼ 20, and BW ¼ 10, butRate (CR) and Accurate Rate (AR): CCA methods were not used in the search process. In evaluating the path evaluation functions, the D-S evidence CR ¼ ðN t À De À S e Þ=N t ; ð31Þ confidence was taken. The recognition results of different AR ¼ ðN t À De À S e À I e Þ=N t ; path evaluation functions (11)-(14) are shown in Table 2, where “w/o” denotes the path evaluation function withoutwhere Nt is the total number of characters in the word insertion penalty ((11) removing the last term). We cantranscript. The numbers of substitution errors (Se ), see that by considering the balance of path length usingdeletion errors (De ), and insertion errors (Ie ) are calculated different heuristics, the string recognition performance isby aligning the recognition result string with the transcript largely improved. Among the four strategies, the one ofby dynamic programming. The metric CR denotes the weighting with character width performs best with respect topercentage of characters that are correctly recognized. both AR and CR. The normalized path function gives a littleFurther, the metric AR considers the number of characters lower CR but significantly lower AR. This is because NPLthat are inserted due to oversegmentation, and is possibly tends to generate more oversegmentation. The performancenegative. Vinciarelli et al. [51] suggested that the AR of weighting with primitive segment number is higher than(called recognition rate there) is an appropriate measure that of NPL, but lower than that of WCW. We hence used thefor document transcription, while CR (called accuracy rate strategy WCW for all the following experiments. 8.3.2 Comparing CT Methods Table 3 shows the results of different CT methods for character classifier introduced in Section 5.1. Compared to the recognition without CT (“w/o” row, it means that lpj , i j ¼ 0; . . . ; 4, take the classifiers similarity outputs directly in WCW function (14)), the sigmoidal confidence improves the AR from 83.60 to 89.42 percent and CR from 85.52 to 90.19Fig. 6. (a) Noncharacters (in small boxes), (b) outlier characters (in small percent; the D-S evidence improves AR from 83.60 to 90.20boxes), (c) oversegmentation errors (indicated by the arrows). percent and CR from 85.52 to 90.80 percent. The soft-max
  17. 17. 1478 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012 TABLE 3 TABLE 5 Effects of Different CT Methods Effects of Different Language Modelsconfidence performs inferiorly, however, because it does notconsider the outlier probability. The benefit of CT (particu-larly, sigmoidal and D-S evidence) is attributed to the fact that and class bigram (“iwc”). The recognition results are shownthe converted posterior probabilities (character classification in Table 5, where “w/o” denotes recognition withoutand geometric models) and the statistical language model are language model. We can see that the character trigrammore compatible to be combined. We thus used the D-S outperforms the character bigram significantly. The advan-evidence confidence in the other experiments by default. tage of trigram is due to its capturing long-distance text dependency. The extension to character 4-gram is not trial8.3.3 Comparing Context Models due to the high complexity, however. Even the modeling ofThe effects of different combinations of context models are third-order word dependency is intractable due to the hugeshown in Table 4, where “cls,” “cg,” “ig,” “g,” and “cti” number of words. The use of second-order word depen-denote the character classifier, the class-dependent geo- dency models nevertheless shows promise: The “wbi” andmetric models (“ucg+bcg”), the class-independent geo- “wcb” both perform comparably with the “cti.” Further, bymetric models (“uig+big”), all geometric models (“cg+ig”), interpolating the word-lavel bigram models, the “iwc”and the character trigram language model, respectively. We yields the best recognition performance.can see when using the character classifier only (“cls”), thestring recognition performance is inferior. Adding geometric 8.3.4 Comparing Search Strategiesmodels to the classifier, the string recognition performance is The above experiments used the default number of 20 candi-remarkably improved. By combining four geometric mod- date classes assigned to each candidate pattern, refined beamels, the AR is improved from 47.89 to 77.34 percent and the search with beam width 10 was used, and the CCA techniquesCR is improved from 68.52 to 79.43 percent. It is observed were not used. We then evaluated the effects of different http://ieeexploreprojects.blogspot.comthat the binary geometric models yield larger improvement candidate class numbers (CN), beam widths (BW ), and CCAthan the unary models. This justifies the importance of the techniques based on the combinations of geometric modelsbetween-character relationship. Also, the class-dependent and character trigram language model in the path evaluationgeometric models (“cls+cg”) perform better than the class- function of WCW with the D-S evidence confidence. Figs. 7aindependent geometric models (“cls+ig”). Compared to the and 7c show the effects of different CN of refined beam searchgeometric models, the statistical language model (“cls+cti”) algorithm, and Figs. 7b and 7d show the effects of differentis much more effective to yield a large improvement of AR BW of both refined and simple beam search methods.and CR. Further, the combination of both geometric and Compared to the simple beam search, our refined beamlanguage models to the character classifier yields the best search algorithm yields much higher recognition accuracy atrecognition result, justifying that geometric context and comparable speed. We can also see that the number oflinguistic context are complementary. 20 candidate classes and the beam width 10 perform Based on the character classifier and geometric models,we then evaluated different language models: characterbigram (“cbi”), character trigram (“cti”), word bigram(“wbi”), word class bigram (“wcb”), interpolating word TABLE 4 Effects of Different Combinations of Contexts Fig. 7. Accuracies of different CN and BW in Steps 2 and 4 of two beam search methods (“-R” and “-S” denote refined and simple beam search, respectively). (a) Accuracies of different CN with BW ¼ 10, (b) accuracies of different BW with CN ¼ 20, (c) processing times of (a), (d) processing times of (b).
  18. 18. WANG ET AL.: HANDWRITTEN CHINESE TEXT RECOGNITION BY INTEGRATING MULTIPLE CONTEXTS 1479 TABLE 6 TABLE 7 Results of CCA Techniques Recognition Results on the HIT-MW Test Set 8.3.6 Examples of Recognition Errorssufficiently well with respect to the recognition accuracy and The string recognition errors of our approach can bethe speed of the refined beam search. Increasing CN and BW , categorized into three types: 1) oversegmentation failurethough it improves the coverage of correct path, does not (undersegmentation), 2) character classification error, in-improve the recognition accuracy. This is because the search cluding the failure for noncharacters and outlier class, 3) pathalgorithm does not guarantee finding the correct path in case search failure. In Table 1, we can see that 4.46 percent ofof a large number of candidate paths due to the insufficient characters were not correctly separated by oversegmentation.quantitative evaluation of paths. Character classification error (about 1.53 percent of characters Table 6 shows the effects of different candidate character when CN ¼ 20) implies that the truth class of candidateaugmentation techniques, namely, confusion information pattern is missed in the top CN ranks so that the candidate(“ci”), forward character bigram (“fcb”), backward char- paths miss the correct one. Path search failure (aboutacter bigram (“bcb”), bigram cache model (“bcm”), combi- 2.62 percent of characters when K ¼ 4, CN ¼ 20, BW ¼ 10)nation of three CCA techniques based on linguistic context(“lc,” i.e., “fcb+bcb+bcm”), and combination of all CCA is the case where even though the correct path is included intechniques (“cca”, i.e., “ci+lc”). It is shown that compared to the candidate paths, it is not the “optimal” path withrecognition without CCA (“w/o”), the CCA techniques maximum score due to the imperfect evaluation of paths.yield only slight improvement of recognition performance. Some examples of noncharacters, outliers, and over-Particularly, augmentation by confusion information makes segmentation errors have been shown in Fig. 6. In addition, many noisy candidate two examples of character classification error and pathalmost no difference. This is because http://ieeexploreprojects.blogspot.comcharacters are also added by CCA techniques. To get the search failure are shown in Fig. 8. In Fig. 8a, thebest performance by the proposed approach, we evaluated misclassified character (indicated by the arrow) was writtenthe effects of combining all CCA techniques based on the as a scrawl and the classifier failed to assign the correct class“iwc” language model, and the results are shown in the last in the top 20 ranks and the CCA techniques failed to pick uprow of Table 6 (others are based on the “cti” language the correct class. In Fig. 8b, the first character wasmodel). Compared to the result of “iwc” without CCA in misclassified, while the second character was missegmen-Table 5, CCA improves the AR from 90.53 to 90.75 percent ted into two characters; both are due to the effect ofand the CR from 91.17 to 91.39 percent. language model because the general-purpose language model does not suit the context of this text of ancient poem.8.3.5 Performance on the HIT-MW Test SetFinally, we show the recognition results of our approach on 9 CONCLUSIONthe HIT-MW test set. To apply our character classifier trainedwith gray-scale character images to the binary images of HIT- This paper presented an approach for handwritten ChineseMW, we converted the binary images to gray-scale images of text recognition under the character oversegmentation andtwo levels: 0 for background pixels and G for foreground candidate path search framework. We evaluate the pathspixels (G is the mean value in gray-level normalization for from the Bayesian decision view by combining multiplegray-scale images). For evaluating the effects of several contexts, including the character classification scores,representative context models on this test set, we used the geometric and linguistic contexts. The combining weightspath evaluation function of WCW with D-S evidence of path evaluation function are optimized by a stringconfidence, and the search algorithm was the refined beamsearch with K ¼ 4, CN ¼ 20, and BW ¼ 10. The recognitionresults are shown in Table 7. We can see that both geometricmodels and language models improve the performancelargely, and the best performance is achieved by combiningall the contexts and the CCA methods (“cls þ g þ iwc þ cca”). Compared to the previous results reported on this testset, 34.64 percent of AR and 39.37 percent of CR in [9],73.97 percent of CR in [11], and 78.44 percent of RCR(similar to CR, but matched by character boundaries) in Fig. 8. Two examples of recognition errors; only the part with recognition[10], the proposed approach achieved 91.86 percent of AR error is shown. (a) Character classification error, (b) path search failure.and 92.72 percent of CR, demonstrating significant Upper: oversegmentation,; middle: segmentation-recognition result,improvement and advantage. bottom: ground truth.
  19. 19. 1480 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 8, AUGUST 2012recognition objective, namely, the Maximum Character [8] Q. Fu, X.-Q. Ding, T. Liu, Y. Jiang, and Z. Ren, “A Novel Segmentation and Recognition Algorithm for Chinese Hand-Accuracy criterion. In path search, we use a refined beam written Address Character Strings,” Proc. 18th Int’l Conf. Patternsearch algorithm to improve the accuracy and efficiency. In Recognition, pp. 974-977, Aug. 2006.experiments on the unconstrained Chinese handwriting [9] T.-H. Su, T.-W. Zhang, D.-J. Guan, and H.-J. Huang, “Off-Line Recognition of Realistic Chinese Handwriting Using Segmenta-database CASIA-HWDB, the proposed approach achieved tion-Free Strategy,” Pattern Recognition, vol. 42, no. 1, pp. 167-182,the character-level accurate rate of 90.75 percent and correct 2009.rate of 91.39 percent. The experimental results justify the [10] Q.-F. Wang, F. Yin, and C.-L. Liu, “Integrating Languagebenefits of confidence transformation of classifier outputs, Model in Handwritten Chinese Text Recognition,” Proc. 10th Int’l Conf. Document Analysis and Recognition, pp. 1036-1040, Julygeometric context models, and language models. Never- 2009.theless, the effect of candidate character augmentation is [11] N.-X. Li and L.-W. Jin, “A Bayesian-Based Probabilistic Model forlimited. We also evaluated performance on the HIW-MW Unconstrained Handwritten Offline Chinese Text Line Recogni- tion,” Proc. IEEE Int’l Conf. Systems, Man, and Cybernetics, pp. 3664-test set and achieved an accuracy rate of 91.86 percent and 3668, 2010.correct rate of 92.72 percent, which are significantly higher [12] M. Nakagawa, B. Zhu, and M. Onuma, “A Model of On-Linethan those reported in the literature. Handwritten Japanese Text Recognition Free from Line Direction and Writing Format Constraints,” IEICE Trans. Information and The analysis of recognition errors indicates that further Systems, vol. 88, no. 8, pp. 1815-1822, Aug. 2005.research efforts are needed to improve the character [13] X.-D. Zhou, J.-L. Yu, C.-L. Liu, T. Nagasaki, and K. Marukawa,oversegmentation, character classification, and path evalua- “Online Handwritten Japanese Character String Recognition Incorporating Geometric Context,” Proc. Ninth Int’l Conf. Docu-tion. The objective of oversegmentation is to improve the ment Analysis and Recognition, pp. 48-52, Sept. 2007.tradeoff between the number of splitting points (affecting [14] X.-D. Zhou, C.-L. Liu, and M. Nakagawa, “Online Handwrittenthe complexity of search space) and the accuracy of Japanese Character String Recognition Using Conditional Randomseparating characters at their boundaries. The objective of Fields,” Proc. 10th Int’l Conf. Document Analysis and Recognition, pp. 521-525, July 2009.character classification is to improve the classification [15] B. Zhu, X.-D. Zhou, C.-L. Liu, and M. Nakagawa, “A Robustaccuracy and the tradeoff between the number of candidate Model for On-Line Handwritten Japanese Text Recognition,” Int’lclasses and the probability of including the true class. For J. Document Analysis and Recognition, vol. 13, no. 2, pp. 121-131, 2010.path evaluation, both the geometric model and the [16] M. Cheriet, N. Kharma, C.-L. Liu, and C.Y. Suen, Characterlanguage model deserve elaboration. Particularly, our Recognition Systems: A Guide for Students and Practitioners. Johnexperimental results show that mismatch of language Wiley Sons, Inc., 2007. [17] H. Murase, “Online Recognition of Free-Format Japanese Hand-model and text domain leads to inferior recognition writings,” Proc. Ninth Int’l Conf. Pattern Recognition, pp. 1143-1147,performance. Therefore, the domain adaptation of language 1988.model will be an important research and Z. Ren, “Context Driven Chinese String direction. In addition, [18] Y. Jiang, X. Ding, Q. Fu, Segmentation and Recognition,” Proc. Structural, Syntactic, andthe real semantic context and long-distance context will also Statistical Pattern Recognition: Joint IAPR Int’l Workshops, pp. 127-be considered in the future. 135, 2006. [19] X. Ding and H. Liu, “Segmentation-Driven Offline Handwritten Chinese and Arabic Script Recognition,” Proc. Conf. Arabic andACKNOWLEDGMENTS Chinese Handwriting, pp. 61-73, 2006. [20] S. Senda and K. Yamada, “A Maximum-Likelihood Approach toThis work was supported by the National Natural Science Segmentation-Based Recognition of Unconstrained HandwritingFoundation of China (NSFC) under Grants 60825301 and Text,” Proc. Sixth Int’l Conf. Document Analysis and Recognition,60933010. pp. 184-188, Sept. 2001. [21] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, “CASIA Online and Offline Chinese Handwriting Databases,” Proc. 11th Int’l Conf. Document Analysis and Recognition, pp. 37-41, Sept. 2011.REFERENCES [22] L.Y. Tseng and R.C. Chen, “Segmenting Handwritten Chinese[1] R.-W. Dai, C.-L. Liu, and B.-H. Xiao, “Chinese Character Characters Based on Heuristic Merging of Stroke Bounding Boxes Recognition: History, Status and Prospects,” Frontiers of Computer and Dynamic Programming,” Pattern Recognition Letters, vol. 19, Science in China, vol. 1, no. 2, pp. 126-136, 2007. no. 10, pp. 963-973, Aug. 1998.[2] H. Fujisawa, “Forty Years of Research in Character and [23] Z. Liang and P. Shi, “A Metasynthetic Approach for Segmenting Document Recognition—An Industrial Perspective,” Pattern Handwritten Chinese Character Strings,” Pattern Recognition Recognition, vol. 41, no. 8, pp. 2435-2446, Aug. 2008. Letters, vol. 26, no. 10, pp. 1498-1511, July 2005.[3] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, “Online and [24] C.-L. Liu, “Handwritten Chinese Character Recognition: Effects of Offline Handwritten Chinese Character Recognition: Benchmark- Shape Normalization and Feature Extraction,” Proc. Conf. Arabic ing on New Databases,” Proc. Second CJK Joint Workshop Pattern and Chinese Handwriting Recognition, S. Jaeger and D. Doermann, Recognition, Oct. 2010. eds., pp. 104-128, 2008.[4] H.-S. Tang, E. Augustin, C.Y. Suen, O. Baret, and M. Cheriet, [25] C.-L. Liu and H. Fujisawa, “Classification and Learning in “Spiral Recognition Methodology and Its Application for Recogni- Character Recognition: Advances and Remaining Problems,” tion of Chinese Bank Checks,” Proc. Ninth Int’l Workshop Frontiers in Machine Learning in Document Analysis and Recognition, S. Marinai Handwriting Recognition, pp. 263-268, Oct. 2004. and H. Fujisawa, eds., pp. 139-161, Springer, 2008.[5] C.-L. Liu, M. Koga, and H. Fujisawa, “Lexicon-Driven Segmenta- [26] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, “Modified tion and Recognition of Handwritten Character Strings for Quadratic Discriminant Functions and the Application to Chinese Japanese Address Reading,” IEEE Trans. Pattern Analysis and Character Recognition,” IEEE Trans. Pattern Analysis and Machine Machine Intelligence, vol. 24, no. 11, pp. 1425-1437, Nov. 2002. Intelligence, vol. 9, no. 1, pp. 149-153, Jan. 1987.[6] C.-H. Wang, Y. Hotta, M. Suwa, and S. Naoi, “Handwritten [27] C.-L. Liu and M. Nakagawa, “Evaluation of Prototype Learning Chinese Address Recognition,” Proc. Ninth Int’l Workshop Frontiers Algorithms for Nearest Neighbor Classifier in Application to in Handwriting Recognition, pp. 539-544, Oct. 2004. Handwritten Character Recognition,” Pattern Recognition, vol. 34,[7] Z. Han, C.-P. Liu, and X.-C. Yin, “A Two-Stage Handwritten no. 3, pp. 601-615, Mar. 2001. Character Segmentation Approach in Mail Address Recognition,” Proc. Eigth Int’l Conf. Document Analysis and Recognition, pp. 111- 115, Aug. 2005.