Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

icdar a view on the past

  • Be the first to comment

  • Be the first to like this


  1. 1. A View on the Past and Future of Character and Document Recognition Hiromichi Fujisawa Central Research Laboratory, Hitachi, Ltd. Kokubunji, Tokyo, Japan 185-8601 Abstract their ubiquity, are expanding its territory to image documents. Handwriting is being given a second look The paper first gives an overview on the technical at its importance. Papers do not seem to go away. advances in the field of character and document rec- ognition, decade by decade. Then, it highlights key 2. Brief historical view technical developments especially for Kanji (Chinese character) recognition in Japan. Technical issues 2.1. Overview around post address recognition are then discussed, which have promoted advanced techniques including The first commercial OCR appeared in the 1950s in information integration. Robustness design principles the US and, since then, each decade has seen represen- are introduced. Finally, future prospects are discussed. tative developments. In the 1960s, IBM produced several models of “optical readers” for machine- 1. Introduction printed and hand-printed numbers for business use. One of the models could read 200 fonts of printed An industrial view on the character and document documents. In this decade, postal automation for recognition technology is presented, looking from the mechanical letter sorting adopted OCRs for the first past to the future. Since the birth of commercial Opti- time to automatically read postal codes to determine cal Character Readers (OCRs) in the 1950s, the destinations, in the US, Europe, and Japan, independ- character and document recognition technology has ently. In Japan, Toshiba and NEC developed hand- made tremendous advancement, always supporting printed digit recognition apparatus in 1967, which were industrial and commercial applications. At the same put into operation in 1968. time, these business applications have always promot- In the 1970s, commercial OCRs were becoming ed investments in new technology developments. We pervasive in Japan. Hitachi introduced the first hand- can see a virtuous cycle in here. New technologies printed numeral OCR for business use in 1973, and made new applications possible, and the new applica- NEC introduced the first hand-printed Katakana OCR tions supported the new technology developments. in 1976. Japanese Ministry of International Trade and It seems, however, that the wave of IT technologies Industry (today’s Ministry of Economy, Trade and is surging over this area, which has been cultivated Industry) led a ten-year national project on pattern over the period of more than fifty years. Most of in- recognition, including Kanji recognition and handwrit- formation seems to be born in digital, possibly ten character recognition, from 1971. It attracted many diminishing the demands for this technology. As a students and researchers into pattern recognition. In the matter of fact, this is the second of this kind. The first US, IBM introduced a deposit processing system (IBM was the wave of Office Automation in the 1980s. A 3895) in 1977, which could recognize unconstrained strong expectation was that paper documents would handwritten numbers on bank checks. The author had a disappear because all documents would be produced chance to see its operation at Mellon Bank in Pitts- electronically. The result was on the contrary that the burgh in 1981 and was explained it could read about peak sales of OCRs in Japan were in the 80s. 50% of them, while others were hand-coded. It is felt, however, that the second wave might be The 1980s was a decade that benefited from the different. This time, it might change the perspective technological progress in semiconductor devices such completely. Other views are of course possible. For as image sensors, microprocessors, memories, and instance, search technologies, which have established custom-designed LSIs. The hardware became smaller
  2. 2. than ever to place on desktop, thanks to microproces- each other in the 1970s in Japan. Commercial OCRs sors and custom-designed LSIs. Then, larger, cheaper were using structural methods for handwritten alpha- memories and image sensors enabled whole page im- numerics and Katakana’s, and pattern matching ages to be scanned and stored in a memory for further methods for machine-printed alphanumerics. A pattern processing, allowing more advanced recognition and matching method had been proved experimentally to wider applications. For example, a handwritten numer- be applicable to machine-printed Kanji recognition by al OCR that could recognize touching characters was the late 1970s [8-10]. introduced for the first time in 1983. In the late 1980s, The problem for us in those days was a method for Japanese commercial OCRs introduced machine- recognizing handwritten Kanji’s. It was like an unex- printed and hand-printed Kanji recognition capabilities, plored, huge mountain standing in front of us. What which could recognize about 2,400 classes of Kanji’s. was clear was that the structural approach and simple Another important feature of this decade is that op- pattern matching approach could not conquer it. The tical disks for computer use were developed and put former had weakness in explosive topological varia- into use for patent automation systems in the US and in tions due to complex strokes, while the latter had Japan. They can be considered the first “digital librar- weakness in shape variations; however the latter ies.” The Japanese patent office system currently stores seemed to have greater chance of success. approximately 50 million documents or 200 million The concept of blurring as feature extraction was pages. Most of the documents are in terms of scanned extended to directional features and found to be effec- digital images. So, it was the time when studies on tive for handwritten Kanji recognition, though a document understanding and document layout analysis preliminary study began with handwritten numeral began in Japan. recognition [11, 12]. By introducing spatial continuous The changes in the 1990s were due to performance feature extraction, the optimum amount of blurring improvements in UNIX workstations and, then, per- turned out to be surprisingly large. Non-linear shape sonal computers. Though scanning and image normalization [13, 14] and statistical classifier methods preprocessing were still realized in hardware, major [1] boosted the recognition accuracy to a commercial part of recognition were implemented in software. The value. We learned that blurring should be considered as implication was that general purpose programming a means to obtain reduced prominent dimensions (sub- languages like c and c++ could be used for recognition space) rather than to lower computational cost, though algorithms, allowing engineers to develop more com- the effects seem similar. Normally, a feature vector for plicated algorithms, and also expanding the research Kanji patterns consists of 8 x 8 x 4 elements (Figure 1) community more to academia. Software OCR packages and the subspace after statistical analysis has around running on PCs appeared in the market as well. 100 dimensions. Freely handwritten character recognition techniques Recent advancement in Chinese (Kanji) recognition were extensively studied, and successfully applied to methods is well presented in [15]. bank check readers and postal address readers. Ad- vanced layout analysis techniques enabled recognition of wide varieties of business forms. We were also in- volved in the development of a postal address recognition system for Japanese mail pieces. The IAPR conferences and research communities have been contributing to technical progresses. Many of the methods playing key roles in today’s systems have been studied thoroughly. Examples are artificial neural networks, Hidden Markov Models (HMMs), polynomial function classifiers, modified quadratic discriminant function (MQDF) classifiers [1], support Figure 1. Directional features vector machines (SVMs), classifier combination [2, 3], information integration, and lexicon-directed character 2.3. Character segmentation algorithms string recognition [4-7], some of which have original versions back to 1960s. In the 1960s and 1970s, a flying-spot scanner, laser scanner or other kind of mechanical scanner was used 2.2. Character recognition algorithms with a photo-multiplier as a sensor to obtain character images. In a sense, character segmentation was done The structural analysis approach and pattern match- with these kinds of scanning devices. Then in the ing (or statistical) approach were the ideas competing 1980s appeared semiconductor sensors and memories,
  3. 3. allowing OCRs to scan and store an image of one char- state traverse is from the first place in the lattice, a acter line and, later, a full page image. penalty zero is given, and if any label does not coincide, This change relaxed strict conditions on OCR form the edge “Others” is selected, giving the penalty of 15. specifications, for example, allowing smaller non- Generally, the penalty depends on the position in the separated writing boxes, which required a touching lattice (Figure 3). In this way, every word in a lexicon digit separation algorithm, however [16]. In 1983, is given a penalty value, and a word with the smallest Hitachi produced one of the first OCRs that could penalty is determined to be the recognized word. This segment and recognize touching handwritten digits was used successfully for address phrases, provided based on a multiple-hypothesis segmentation- that character segmentation was reliable enough. recognition method. Contour shape analysis could When Furigana (pronunciation in terms of syllabic identify candidate touching points (Figure 2). characters) was available in addition to the Kanji ver- This direction of changes led us to “forms process- sion, both versions could be recognized and the results ing,” whose ultimate target was to read unknown forms, could be merged for a higher accuracy. In Japanese or at least those forms that were not specifically des- business forms, it is normal that we are requested to fill igned for OCRs. But, this meant that users became less in Kanji and Furigana versions. careful in their writing styles, and, therefore, OCRs had As discussed later again, when segmentation is not to be more accurate for freely written characters. Tech- reliable as for freely handwritten phrases, more com- nically, such techniques as run-length code-based plicated knowledge integration approaches are required. preprocessing, connected component analysis, contour shape analysis, touching character-line separation, segmentation-recognition integration, etc. were devel- oped. Figure 3. Finite state automaton 3. Robustness against uncertainty and variability Figure 2. Segmentation of touching digits Postal address recognition is one of the “ideal” ap- plications that advance the technology. It is 2.4. Linguistic information integration technology-rich, posing a lot of technical problems, and it promises post office innovation, whose invest- Kanji OCRs, which read Kanji’s as one of the ex- ments pay off. R&D projects for developing a system tended functions of commercial OCR, were used for that could read handwritten and machine-printed full- reading handwritten Kanji names and addresses. The addresses were led in the US, Europe and Japan, in first generations used OCR forms with fixed, separated industry and academia, in the 1990s. boxes, causing no segmentation problem. Question was The recognition engine we developed for post of- how to keep the phrase recognition accuracy high. fices became complex as shown in Figure 4, as a result We could utilize a priori linguistic knowledge to of coping with various problems. The recognition sub- pick up correct choices from the candidate lattice after modules output uncertain (intermediate) decisions. character recognition. The method we developed used Dozens of such decisions must be made in series until a finite state automaton, which was dynamically creat- reaching the final address interpretation. A natural ed from the lattice and was equivalent to the lattice solution is to hand over multiple candidates, which we contents [17]. Then, a word (or a character string) from call “hypotheses,” to the following stages. Then, we a lexicon is fed into the automaton and an active state need to introduce a mechanism to control the sequen- makes transitions through edges whose label coincides tial decision process, which is a kind of optimum with the input characters. If the label through which a search after all.
  4. 4. algorithms were integrated into the software recogni- tion engine successfully. Figure 5 shows the full- address recognition rates for handwritten addresses for four versions, V1 through V4. The horizontal axis shows sample dataset numbers, which have been rear- ranged so that the rates come into decreasing order. Figure 4. Postal address recognition Table 1. Design principles for robustness Principles Expected effects Figure 5. Improvements in handwritten Hypothesis-Driven When the type of a problem is address recognition uncertain, set up hypotheses, process, and test the results 4. Future prospects Deferred Decision / Do not decide; leave the deci- Multiple Hypotheses sion to the next experts carrying over multiple hypotheses A question is to where we should go, perhaps. The anticipated needs for character and document recogni- Process Integration Solve a problem by multiple tion for the future include the following: different-field experts as a team Information Combination- Decide as a team of multiple • Archival records and image replacement docu- Integration Based Integration same-field experts ments in e-Government Corroboration- Utilize other input informa- • Books and historical documents for global search Based Integration tion; seek more evidence • Handwriting captured by digital pens Alternative Solutions Solve a problem by multiple • Text-in-the-scene captured by cameras alternative approaches • Text in video Perturbation Modify the problem slightly and try it again One thing that is almost clear is that we are going into a “long tail” part of the market. The “head” part has been already computerized either with the existing Post address phrases are semantic-rich, and “infor- mation integration” approach can be successfully OCR technology or other totally electronic means. The applied. As other intelligent handwriting recognition remaining part has an extremely wide variety of docu- systems apply, we have developed a segmentation- ments with not so many instances for each. Non- recognition-interpretation integrated method [6, 7], in standardized business forms are this kind of example. which segmentation produces a segmentation candidate For instance, small and medium-sized companies in network, where an optimum path is selected by Japan are still using paper forms to make bank transac- evaluating the likelihood. It is done by pattern match- tions. For each company, the number of transactions is ing against the language model for possible address not so large, but banks receive many different types of phrases. Segmentation required geometrical alignment forms from many companies. More intelligent, versa- tile form reader may solve this problem. be evaluated as well [18]. Handwriting is being reconsidered its importance in The issues were how to design “robustness” in the education and in a knowledge work context. The act of system. We formulated the design principles for ro- bustness as shown in Table 1 while carrying out the writing helps reading and thinking processes, and a project [19, 20]. By applying them, many additional digital pen can capture handwritten annotations and memos, being stored in computers. As papers being
  5. 5. considered the medium for such processes [21], we can se characters,” IEEE Trans. Electronic Computers, print out electronically produced documents and work Vol. EC-15, No. 1, 1966, pp. 91-101. on them with a digital pen, while all information can be [9] S. Yamamoto, A. Nakajima, K. Nakata, “Chinese kept in computers. Then, search capability on such character recognition by hierarchical pattern match- hand-annotated documents will be an important tool. ing,” Proc. 1st IJCPR, Washington DC, 1973, A mobile device with text-in-scene recognition will pp.183-194. be a necessary gadget for travelers in foreign countries. [10] H. Fujisawa, Y. Nakano, Y. Kitazume, and M. It should be able to help understanding of second and Yasuda, “Development of a Kanji OCR: An Opti- third languages, or more. Color image processing, cal Chinese Character Reader,” Proc. 4th IJCPR, geometric perspective normalization, text segmentation, Kyoto, Nov. 1978, pp. 815-820. [11] M. Yasuda and H. Fujisawa, “An Improvement of adaptive thresholding, etc. need to be studied. Correlation Method for Character Recognition,” What is common through out these applications is Systems, Computers, Controls, Scripta Publishing that no single recognition algorithm may realize them. Co., Vol. 10, No. 2, 1979, pp. 29-38. Higher-order image processing before recognition will [12] H. Fujisawa and C-L. Liu, “Directional Pattern be mandatory. The solution should be comprehensive. Matching for Character Recognition Revisited,” Proc. 7th ICDAR, Edinburgh, Aug. 2003, pp. 794- 5. Conclusions 798. [13] J. Tsukumo and H. Tanaka, “Classification of For the bright future of this technological commu- Handprinted Chinese Characters Using Non-linear nity, both vision and fundamental technology are in Normalization and Correlation Methods,” Proc. 9th demand. Vision will show applications with new value ICPR, Rome, Italy, 1988, pp. 168-171. propositions that require new technology. But, tech- [14] C.-L. Liu, “Normalization-Cooperated Gradient nology creates demands as well. Feature Extraction for Handwritten Character Rec- ognition,” IEEE Trans. PAMI, Vol. 29, No. 6, 2007, pp. 1465-1469. 6. References [15] C.-L. Liu, “Handwritten Chinese Character Rec- ognition: Effects of Shape Normalization and [1] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Mi- Feature Extraction,” Proc. Summit on Arabic and yake, “Modified Quadratic Discriminant Functions Chinese Handwriting, College Park, Sep. 2006. and the Application to Chinese Character Recogni- [16] H. Fujisawa, Y. Nakano, and K. Kurino, “Seg- tion,” IEEE Trans. PAMI, Vol. 9, No. 1, 1987, pp. mentation Methods for Character Recognition: 149-153. From Segmentation to Document Structure Analy- [2] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of sis,” Proc. IEEE, Vol. 80, No. 7, 1992, pp. 1079- Combining Multiple Classifiers and Their Applica- 1092. tions to Handwriting Recognition,” IEEE Trans. [17] K. Marukawa, M. Koga, Y. Shima, and H. Fuji- SMC, Vol. 22, No. 3, 1992, pp. 418-435. sawa, “An Error Correction Algorithm for [3] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision Handwritten Chinese Character Address Recogni- Combination in Multiple Classifier Systems,” IEEE tion,” Proc. 1st ICDAR, Saint-Malo, Sep. 1991, pp. Trans. PAMI, Vol. 16, No. 1, 1994, pp. 66-75. 916-924. [4] F. Kimura, M. Sridhar, and Z. Chen, “Improve- [18] T. Kagehiro, M. Koga, H. Sako, and H. Fujisawa, ments of Lexicon-Directed Algorithm for “Segmentation of Handwritten Kanji Numerals In- Recognition of Unconstrained Hand-Written tegrating Peripheral Information by Bayesian Words,” Proc. 2nd ICDAR, Tsukuba, Japan, Oct. Rule,” Proc. IAPR MVA’98, Chiba, Japan, Nov. 1993, pp. 18-22. 1998, pp. 439-442. [5] C. H. Chen, “Lexicon-Driven Word Recognition,” [19] H. Fujisawa, “How to Deal with Uncertainty and Proc. 3rd ICDAR, Montreal, Canada, Aug. 1995, Variability: Experience and Solutions,” Proc. pp. 919-922. Summit on Arabic and Chinese Handwriting, Col- [6] M. Koga, R. Mine, H. Sako, and H. Fujisawa, lege Park, Sep. 2006. “Lexical Search Approach for Character-String [20] H. Fujisawa, “Robustness Design of Industrial Recognition,” Proc. 3rd DAS, Nagano, Japan, Nov. Strength Recognition Systems,” Digital Document 1998, pp. 237-251. Processing: Major Directions and Recent Advances, [7] C.-L. Liu, M. Koga and H. Fujisawa, “Lexicon- B.B. Chaudhuri (Ed.), Springer-Verlag, London, driven Segmentation and Recognition of Handwrit- 2007, pp. 185-212. ten Character Strings for Japanese Address [21] A.J. Sellen and R.H. Harper, “The Myth of the Reading,” IEEE Trans. PAMI, Vol. 24, No. 11, Paperless Office,” The MIT Press, 2001. 2002, pp. 425-1437. [8] R. Casey, G. Nagy, “Recognition of printed Chine-