0
OCR on Android
Vladimir Kulyukin

www.vkedco.blogspot.com
Outline
●

OCR Background


Application Domains



Approaches

●

Zone Vector Matching

●

Tesseract on Android
OCR Background
What is OCR
●

●

●

OCR
stands
recognition

for

optical

character

OCR is automated conversion of printed,
typewritten,...
OCR Application Domains
●

●

●

●

Electronic
document
conversion:
businesses use OCR for data entry
Conversion of handwr...
OCR Approaches
●

●

●

Character Recognition: segment images into blobs
(glyphs) and recognize blobs as characters one at...
OCR Techniques
●

●

●

●

Deskewing – realignment of skewed text segments
Binarization – conversion of color or gray scal...
Zone Matching
Zone Matching
●

●

●

●

Zone matching is a technique used in computer vision for
image matching
An image is divided into...
Example: Zone Stats & Feature Vectors

1) Image is divided into four zones (1, 2, 3, 4 moving clockwise)
2) Statistic is t...
Example: 16 5x5 zones of 20x20 Image
Example: Cosine Vector Similarity

1) A and B are feature vectors
2) Ai and Bi are i-th elements of A and B
Zone Vector Matching in OCR: Steps
●

●

●

●

Create a library of character images
For each character image, compute its ...
Sample Character Images

Archives with character images are here
Tesseract on Android
Tesseract
●

●

●

Tesseract is an optical character recognition (OCR)
engine for various operating systems
Tesseract is r...
Making Android Applications OCR-capable
with Tesseract
●

Two ways of integrating Tesseract into Android apps.
1. Use a pr...
Reading & References
●

http://en.wikipedia.org/wiki/Optical_character_recognition

●

Making Android Apps OCR-Capable wit...
Upcoming SlideShare
Loading in...5
×

MobAppDev: OCR on Android

668

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
668
On Slideshare
0
From Embeds
0
Number of Embeds
40
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "MobAppDev: OCR on Android"

  1. 1. OCR on Android Vladimir Kulyukin www.vkedco.blogspot.com
  2. 2. Outline ● OCR Background  Application Domains  Approaches ● Zone Vector Matching ● Tesseract on Android
  3. 3. OCR Background
  4. 4. What is OCR ● ● ● OCR stands recognition for optical character OCR is automated conversion of printed, typewritten, or handwritten text into digital format OCR is an intersection of three areas: pattern recognition, AI, and computer vision
  5. 5. OCR Application Domains ● ● ● ● Electronic document conversion: businesses use OCR for data entry Conversion of handwritten electronic format documents many into Conversion of images with text into searchable documents Translation of street signs on smart phones
  6. 6. OCR Approaches ● ● ● Character Recognition: segment images into blobs (glyphs) and recognize blobs as characters one at a time Word Recognition: segment images into blobs and recognize each blob as a word Intelligent Recognition: use patterns and AI techniques (e.g., artificial neural networks) to recognize hard text (e.g., handwritten)
  7. 7. OCR Techniques ● ● ● ● Deskewing – realignment of skewed text segments Binarization – conversion of color or gray scale images into black and white images Line and word detection – detection of text lines and segmentation of detected lines into word blobs Layout analysis – detection of columns, paragraphs, headings, etc.
  8. 8. Zone Matching
  9. 9. Zone Matching ● ● ● ● Zone matching is a technique used in computer vision for image matching An image is divided into several sub-images, called zones For each zone, a specific statistic or a set of statistics is computed (e.g., number of horizontal lines, number of pixels of a specific color, etc.) Those statistics are placed into feature vectors and the feature vectors are matched against each other
  10. 10. Example: Zone Stats & Feature Vectors 1) Image is divided into four zones (1, 2, 3, 4 moving clockwise) 2) Statistic is the number of black pixels 3) Result feature vector is [2, 3, 5, 4]
  11. 11. Example: 16 5x5 zones of 20x20 Image
  12. 12. Example: Cosine Vector Similarity 1) A and B are feature vectors 2) Ai and Bi are i-th elements of A and B
  13. 13. Zone Vector Matching in OCR: Steps ● ● ● ● Create a library of character images For each character image, compute its zone feature vector and save it in a table of feature vectors Given an input character image, compute its zone feature vector and match it with all feature vectors saved in the table of feature vectors Return top N characters whose feature vectors are closest to the feature vector of the input character
  14. 14. Sample Character Images Archives with character images are here
  15. 15. Tesseract on Android
  16. 16. Tesseract ● ● ● Tesseract is an optical character recognition (OCR) engine for various operating systems Tesseract is released under the Apache License, Version 2.0 and development has been sponsored by Google since 2006 Source: http://en.wikipedia.org/wiki/Tesseract_(software)
  17. 17. Making Android Applications OCR-capable with Tesseract ● Two ways of integrating Tesseract into Android apps. 1. Use a pre-built Java Tesseract library 2. Download all Tesseract packages and build the same library 3. Follow the steps at this blogpost to build a Tesseractenabled Android application
  18. 18. Reading & References ● http://en.wikipedia.org/wiki/Optical_character_recognition ● Making Android Apps OCR-Capable with Tesseract ● http://vkedco.blogspot.com/2012/03/exploring-basics-of-zone-vector.html
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×