Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OCR using Tesseract


Published on

This presentation explains the working of OCR engine for character recognition. It demonstrates the working of the Tesseract by Google.

Published in: Engineering

OCR using Tesseract

  1. 1. Real time OCR using Tesseract 12BCE094 SHOBHIT CHITTORA
  2. 2. Brief History Of Tesseract  Open Source OCR engine sponsored by Google since 2006.  One of the most accurate open source OCR engines currently available.  Originally developed by HP between 1985-1994.  Lot of it is written in C and C++.
  3. 3. TessOCR Architecture
  4. 4. Adaptive Thresholding is Essential
  5. 5. Baselines are rarely perfectly straight
  6. 6. Spaces between words are tricky too  Italics, digits, punctuation all create special-case font-dependent spacing.  Fully justified text in narrow columns can have vastly varying spacing on different lines.
  7. 7. Tesseract Word Recognizer
  8. 8. Outline Approximation  Polygonal approximation is a double-edged sword.  Noise and some pertinent information are both lost.
  9. 9. Why it’s called Tesseract?  Elements of the polygonal approximation, clustered within a character/font combination.  x, y position, direction, and length (as a multiple of feature length)
  10. 10. Character Classifier (Features and Matching)  Static classifier uses outline fragments as features. Broken characters are easily recognizable by a small->large matching process in classifier. (This is slow.)  Adaptive classifier uses the same technique!
  11. 11. Classifier as Histogram of Gradients  Quantize character area.  Compute gradients within.  Histograms of gradients map to fixed dimension feature vector.
  12. 12. Character Segmentation  Segmentation Graphs
  13. 13. Rating and Certainty  Rating = Distance * Outline length ○ Total rating over a word (or line if you prefer) is normalized ○ Different length transcriptions are fairly comparable  Certainty = -20 * Distance ○ Measures the absolute classification confidence ○ Surrogate for log probability and is used to decide what needs more work.
  14. 14. Tesseract Training
  15. 15. Implementation using Tess-two( Tess port for Android)  The Tess-two library is an open source port of Tesseract engine for Android.  Only the most basic and popular functionalities are ported.  Things such as deep neutral nets are not ported.  A lot of tweaking is required to produce desired results.
  16. 16. DEMO
  17. 17. Implementing Real Time OCR and challenges  Image processing on memory limited devices is difficult.  Limited clock speeds to process huge matrices.  Running the Camera Surface Holder in MainUI and preprocessing and OCR on user threads.  Maintaining huge Bitmaps for preprocessing and sending to multiple threads.  Avoiding Garbage Collection of important preprocessed data.
  18. 18. Thank You