SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
Presentation introducing OCR general features in relation to IMPACT project presented by Clemens Neudecker during demo session held at the BNE 5th of October 2011.
Presentation introducing OCR general features in relation to IMPACT project presented by Clemens Neudecker during demo session held at the BNE 5th of October 2011.
1.
IMPACT OCR in a nutshell Clemens Neudecker, National Library of the Netherlands IMPACT Demo Day, Biblioteca Nacional de España
2.
OCR Process <ul><li>Binarisation </li></ul><ul><li>= transform greyscale or colour images to bitonal (b/w) </li></ul><ul><li>in order to separate foreground (text) from background </li></ul><ul><li>Segmentation </li></ul><ul><li>= detection of layout elements in hierarchical order </li></ul><ul><li>(blocks/regions, lines, words, glyphs) </li></ul><ul><li>Pattern Matching (Recognition) </li></ul><ul><li>= matching of character shapes with internal font database (classifiers) </li></ul>
3.
ABBYY FineReader <ul><li>Main OCR technology provider in IMPACT </li></ul><ul><li>OCR technologies experts since 30 years </li></ul><ul><li>IMPACT uses FineReader Engine (SDK) </li></ul>
12.
Languages and Dictionaries <ul><li>Goal: </li></ul><ul><ul><li>Develop an interface so that external dictionaries can be integrated into the FineReader Engine </li></ul></ul><ul><li>2008 - 2009: </li></ul><ul><ul><li>External Dictionary beta interface </li></ul></ul><ul><ul><li>Same quality as with internal dictionaries possible </li></ul></ul><ul><li>2010 - 2011: </li></ul><ul><ul><li>Make interface work reliably </li></ul></ul><ul><ul><li>Teach partners how to use it </li></ul></ul><ul><ul><li>Support for any language, any time period </li></ul></ul>
13.
ALTO: New native export format <ul><li>Available since FRE 10 R2 </li></ul><ul><li>Supports most recent schema: ALTO v. 2.0 </li></ul><ul><li>Line coordinates available </li></ul>