Your SlideShare is downloading. ×
IMPACT OCR in a nutshell. Clemens Neudecker
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

IMPACT OCR in a nutshell. Clemens Neudecker

636
views

Published on

Presentada en "Sesión de demostración de IMPACT en la BNE". Octubre. Biblioteca Nacional de España.

Presentada en "Sesión de demostración de IMPACT en la BNE". Octubre. Biblioteca Nacional de España.

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
636
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.IMPACT OCR in a nutshellClemens Neudecker, National Library of the NetherlandsIMPACT Demo Day, Biblioteca Nacional de España
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.OCR Process Binarisation = transform greyscale or colour images to bitonal (b/w) in order to separate foreground (text) from background Segmentation = detection of layout elements in hierarchical order (blocks/regions, lines, words, glyphs) Pattern Matching (Recognition) = matching of character shapes with internal font database (classifiers)
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.ABBYY FineReader Main OCR technology provider in IMPACT OCR technologies experts since 30 years IMPACT uses FineReader Engine (SDK)
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.Binarisation
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.Adaptive Binarisation Original scan Prev. binarization New binarization
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.IMPACT Binarisation Original State of the Art IMPACT 6
  • 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.Segmentation Blocks/Regions Words Glyphs
  • 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.IMPACT Segmentation example Pre-Impact FR Engine 9 FR Engine 10 Part of column was misclassified as image 8
  • 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.IMPACT Segmentation example v. 9 v. 10 Linear word order errors 9
  • 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.IMPACT Segmentation example v. 9 v. 10 Lost text 10
  • 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.Fraktur recognition
  • 12. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.Languages and Dictionaries Goal: • Develop an interface so that external dictionaries can be integrated into the FineReader Engine 2008 - 2009: • External Dictionary beta interface • Same quality as with internal dictionaries possible 2010 - 2011: • Make interface work reliably • Teach partners how to use it • Support for any language, any time period 12
  • 13. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.ALTO: New native export format Available since FRE 10 R2 Supports most recent schema: ALTO v. 2.0 Line coordinates available
  • 14. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Thank you! Questions?