1. Project Proposal Form
CS791A – Machine Learning
Program Name: Fused Optical Character Recognition System (FOCRS)
Program Participants: Nick Bartlow, Nathan. Kalka New:_X_ Continuation:____
Description: According to [1], “Optical character recognition (OCR), as understood in the following, is the whole process of
transforming a document image (machine printed or handwritten) into a corresponding ASCII text. Many steps are necessary to
perform this task, e.g. layout analysis, image preprocessing, line segmentation, character recognition, contextual
postprocessing... Modifying one of them may lead to completely different results.” Much research has gone into the
development of applications to provide (semi) automatic OCR. As technology has matured, performance of such applications
has improved dramatically. That said, performance of applications is necessarily a function of the testing environment / data
repositories. Additionally, applications often approach the problem of OCR with (semi) orthogonal methodologies to reach a
final solution. Given this fact, various data fusion methodologies may lead to promising results if multiple OCR packages are
combined appropriately.
Experimental Plan: We intend on taking a series of freely distributed OCR packages (gocr, tesseract, ocrad, ocropus,
etc…) and developing a framework for combining their output with the intention of arriving at an increased level of accuracy
relative to the individual packages alone. Techniques such as boosting, cascading, and adaptive fusion frameworks may be
investigated to this end. Besides applying the chosen techniques on machine generated datasets and samples from paper
documents scanned electronically, we will also collect a dataset consisting of handwriting samples gathered electronically
through a tablet PC. Formal comparisons of performance will include recognition of individual characters as well as passages
of text.
Related Work Elsewhere: [1] Incorporates
geometrical criteria to prevent incorrect character
segmentations as well as improving performance through
classical combination rules such as Borda Count or
Plurality Vote. [2] Focuses on obtaining a tradeoff
between speed and recognition accuracy through a
cascade of classifiers. [3] Investigates the utility of string
alignment algorithms in merging outputs from multiple
OCR classifiers.
How ours is Different: To the best of our knowledge we
have not seen any experiments of OCR technologies on
handwritten databases collected electronically through tablet PC.
Although, the results of collection in this format may be arguably
similar to scanned handwriting samples, we anticipate different
recognition challenges with data acquired in this manner. Besides
natural differences in quality related to the capture device (tablet
PC vs. paper), the dynamics of the writing process also changes.
We expect this change in “writing style” to be observed in varying
degrees from individual to individual.
Related Work in:
[1] E. Wilczok, W. Lellmann, “Adaptive Combination of
Commercial OCR Systems,” Book Series Lecture Notes in
Computer Science, Vol. 2956, 2004.
[2] K. Chellapilla, M. Shilman, P. Simard, “Optimally
combining a cascade of classifiers,” Proceedings of SPIE, 2006.
[3] J.C. Handley, “Improving OCR accuracy through
combination: A survey,” Proc. IEEE Int. Conf. Syst. Man
Cybern. Vol. 5, pp. 4330-4333. 1998.
Milestones:
(1) Construct / collect a database of testing images including
machine generated text, scanned handwritten text, and
electronically gathered text.
(2) Construct a cascade of classifiers and or fusion framework for
individual OCR packages.
(3) Analyze results on data collected from (1).
Deliverables: Milestones will result in a technical
report / presentation.
Budget: Total: ~$40,000, Students: $30,000 (get us while
we’re still cheap) Travel: $5,000, Other (software, office supplies)
$5,000 (we need new machines).
Progress to Date: Various individual OCR programs installed / sanity tested.
Knowledge Transfer Target Date: 2 months, Fall 2007.