IMPACT OCR in a nutshell Clemens Neudecker, National Library of the Netherlands IMPACT Demo Day, British Library 12/11/11
OCR Process <ul><li>Binarisation </li></ul><ul><li>= transform greyscale or colour images to bitonal (b/w) </li></ul><ul><...
ABBYY FineReader <ul><li>Main OCR technology provider in IMPACT </li></ul><ul><li>OCR technologies experts since 30 years ...
Binarisation
Adaptive Binarisation Original scan Prev.  binarization New  binarization
IMPACT Binarisation <ul><li>Original </li></ul><ul><li>State of the Art </li></ul><ul><li>IMPACT </li></ul>
Segmentation Blocks/Regions Words   Glyphs
IMPACT Segmentation example Pre-Impact FR Engine 9 FR Engine 10 Part of column was misclassified as image
IMPACT Segmentation example v. 9 v. 10 Linear word order errors
IMPACT Segmentation example v. 9 v. 10 Lost text
Recognition
Languages and Dictionaries <ul><li>Goal: </li></ul><ul><ul><li>Develop an interface so that external dictionaries can be i...
ALTO: New native export format <ul><li>Available since FRE 10 R2 </li></ul><ul><li>Supports most recent schema: ALTO v. 2....
Thank you! Questions?
Upcoming SlideShare
Loading in …5
×

BL Demo Day - July2011 - (4) OCR for IMPACT Part 1

4,797 views
4,749 views

Published on

Clemens Neudecker's presentation describing how IMPACT has improved the quality of OCR in conjunction with ABBYY FineReader Engine.

Delivered at the IMPACT BL Demo Day on the 12th of July 2011.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,797
On SlideShare
0
From Embeds
0
Number of Embeds
2,083
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BL Demo Day - July2011 - (4) OCR for IMPACT Part 1

  1. 1. IMPACT OCR in a nutshell Clemens Neudecker, National Library of the Netherlands IMPACT Demo Day, British Library 12/11/11
  2. 2. OCR Process <ul><li>Binarisation </li></ul><ul><li>= transform greyscale or colour images to bitonal (b/w) </li></ul><ul><li>in order to separate foreground (text) from background </li></ul><ul><li>Segmentation </li></ul><ul><li>= detection of layout elements in hierarchical order </li></ul><ul><li>(blocks/regions, lines, words, glyphs) </li></ul><ul><li>Pattern Matching (Recognition) </li></ul><ul><li>= matching of character shapes with internal font database (classifiers) </li></ul>
  3. 3. ABBYY FineReader <ul><li>Main OCR technology provider in IMPACT </li></ul><ul><li>OCR technologies experts since 30 years </li></ul><ul><li>IMPACT uses FineReader Engine (SDK) </li></ul>
  4. 4. Binarisation
  5. 5. Adaptive Binarisation Original scan Prev. binarization New binarization
  6. 6. IMPACT Binarisation <ul><li>Original </li></ul><ul><li>State of the Art </li></ul><ul><li>IMPACT </li></ul>
  7. 7. Segmentation Blocks/Regions Words Glyphs
  8. 8. IMPACT Segmentation example Pre-Impact FR Engine 9 FR Engine 10 Part of column was misclassified as image
  9. 9. IMPACT Segmentation example v. 9 v. 10 Linear word order errors
  10. 10. IMPACT Segmentation example v. 9 v. 10 Lost text
  11. 11. Recognition
  12. 12. Languages and Dictionaries <ul><li>Goal: </li></ul><ul><ul><li>Develop an interface so that external dictionaries can be integrated into the FineReader Engine </li></ul></ul><ul><li>2008 - 2009: </li></ul><ul><ul><li>External Dictionary beta interface </li></ul></ul><ul><ul><li>Same quality as with internal dictionaries possible </li></ul></ul><ul><li>2010 - 2011: </li></ul><ul><ul><li>Make interface work reliably </li></ul></ul><ul><ul><li>Teach partners how to use it </li></ul></ul><ul><ul><li>Support for any language, any time period </li></ul></ul>
  13. 13. ALTO: New native export format <ul><li>Available since FRE 10 R2 </li></ul><ul><li>Supports most recent schema: ALTO v. 2.0 </li></ul><ul><li>Line coordinates available </li></ul>
  14. 14. Thank you! Questions?

×