Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IMPACT Final Conference - Michael Fuchs

3,234 views

Published on

ABBYY FineReader: IMPACT Improvements with Michael Fuchs from ABBYY Europe

Published in: Education, Technology, Business
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.WritePaper.info ⇐ They helped me for writing my quality research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/2ZDZFYj ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

IMPACT Final Conference - Michael Fuchs

  1. 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of theNetherlands.ABBYY & OCR Improvements for IMPACT Michael Fuchs Senior Product Marketing Manager ABBYY Europe fuchs@abbyy.com
  2. 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of theNetherlands.Agenda  Who is ABBYY?  Company Overview  (Short) Product Overview  ABBYY Technology in the IMPACT project  OCR & Processing – IMPACT improvements  Binarisation, Segmentation,  Recognition  Dictionary API, Export Formats  Lessons Learned, Pricing, Pre-Announcement, Q&A 2
  3. 3. ABBYY & IMPACTABBYY & OCR for IMPACT 3
  4. 4. ABBYY Group Overview ABBYY Group  Founded in 1989 as BIT Software  > 1000 employees in 14 offices worldwide  Headquarters/R&D in Moscow, RussiaABBYY & OCR for IMPACT 4
  5. 5. ABBYY OCR Products – Usage View Desktop/Workgroup Server/Backend SDK/Integration User driven processing, Automated processing, Automated processing, Ready to use Ready to use Development neededOCR & Document Conversion FineReader Recognition Server FineReader Engines (Professional, Corporate, (Professional, Extended (Windows, Linux, Mac OS X, Site Licence Edition) Edition) Free BSD, Embedded Systems) Note: No Gothic/Fraktur OCR! Gothic/Fraktur OCR Mobile OCR Engine & XML Export (Android, Symbian, Linux, PDF Transformer Support! Windows, Windows Mobile, FotoReader iOS ) ScreenshotReader End Users, Companies, Developers,Users are: Companies, Scan Service Provider, Scan Service Provider (Libraries) Libraries IMPACT Research ABBYY & OCR for IMPACT 5
  6. 6. What (ABBYY) OCR can read... Recognition Languages  Almost 200 OCR languages  34 languages with dictionary support and spell check  Alphabets: Cyrillic, Latin, Greek, Armenian, Hebrew, Thai  Chinese, Japanese, Korean (CJK) - 4 sets of hieroglyphs (Chinese (traditional and simplified), Japanese, Korean)  Arabic (Technical Preview in the SDK) Font Types  Recognition of mixed font types (dot-matrix printer, typewriter, Gothic, etc.)  OCR-A  OCR-B  MICR (E13B)  CMC-7ABBYY & OCR for IMPACT 6
  7. 7. IMPACT & ABBYY ABBYY is the OCR technology provider for IMPACT members ABBYY also improved the core technologies for the recognition of old documents in IMPACT, focus areas are/were:  Image pre-processing  Segmentation  Character recognition  Export IMPACT members work with the Software Development Kit (SDK) FineReader Engine – not the desktop application IMPACT focus is/was on research and not in setting up a production system ;o) Improved technologies are/will be added to current/future productsABBYY & OCR for IMPACT 7
  8. 8. Designed to be not OCRedABBYY & OCR for IMPACT 8
  9. 9. Why ABBYY? - OCR … Original Image [perfect quality :o) ] Std. OCR * ABBYY Fraktur OCR* *Recognition Server 3.0 R1 – Gothic/Fraktur disabled and enabled ABBYY & OCR for IMPACT 9
  10. 10. ABBYY “History” and Old Fonts Recognition FineReader XIX (V7 Technology) 2003 (METAe result 2000-2003) FineReader Engine 9.0 (Release 1) 2008 (Pre-IMPACT – “State of the Art”) FineReader Engine 10 2010 IMPACT Project OptimizationsABBYY & OCR for IMPACT 10
  11. 11. ABBYY and Old European FontsAccuracy Comparison: Up to 98,2 % on good quality images 2003 2008 2010ABBYY Technology Version 10 recognition of old European fonts: 25% more accurate than FRE 9.0 38% more accurate than FR XIX ABBYY & OCR for IMPACT 11
  12. 12. OCR Processing Steps & ABBYY Improvements for IMPACTABBYY & OCR for IMPACT 12
  13. 13. Processing Steps Step 1. Scanning, Image Loading, Pre-Processing and Modification  Compensating image defects and making the document suited for automatic OCR Step 2. Document Layout Analysis  Layout analysis, detection of document sections like text, images and barcodes Step 3. (Optical) Character Recognition  Automatic recognition of characters, apply selected recognition languages & dictionaries Step 4. (optional) Verification - by Operators or automated post correction  Manual validation of suspicious characters and words Step 5. Document Synthesis and Export  Generating an output document in the selected formatABBYY & OCR for IMPACT 13
  14. 14. Step 1: Image pre-processingABBYY & OCR for IMPACT 14
  15. 15. Step 1: Image pre-processingImage Loading, Pre-Processing and Modification Intelligent background filtering Adaptive Binarisation General binarisation on an image level can not deliver good results for OCRABBYY & OCR for IMPACT 15
  16. 16. Step 1: Image pre-processingNew V10: Binarisation, Textured Background optimisations Original scanV9 binarisation New V10 binarisationABBYY & OCR for IMPACT 16
  17. 17. Step 1: Image pre-processingNew V10: Binarisation, Textured Background optimisations Original scanV9 binarisation V10 binarisationABBYY & OCR for IMPACT 17
  18. 18. Step 1: Image pre-processingNew V10: Binarisation for the IMPACT project  Original  State of Art (V9)  New (V10)  No text from the other page!ABBYY & OCR for IMPACT 18
  19. 19. Step 2: Document Layout AnalysisABBYY & OCR for IMPACT 19
  20. 20. Step 2: Document Layout AnalysisAnalyze layout and find text, images, tables and barcodesABBYY & OCR for IMPACT 20
  21. 21. Step 2: Document Layout Analysis (old Newspapers)Segmentation Improvements: Image/Text detection – Example 1/3 V9 Technology V10 Technology Part of the column was detected as an imageABBYY & OCR for IMPACT 21
  22. 22. Step 2: Document Layout Analysis (old Newspapers)Segmentation Improvements: Word Order Detection– Example 2/3 V9 Technology V10 Technology Less linear word order errorsABBYY & OCR for IMPACT 22
  23. 23. Step 2: Document Layout Analysis (old Newspapers)Segmentation Improvements: Lost text (no Detection) – Example3/3 V9 Technology V10 Technology Less lost textABBYY & OCR for IMPACT 23
  24. 24. Step 2: Document Layout AnalysisSegmentation Improvements: IMPACT Results over time Before IMPACT:  Overall segmentation improvements ● Better picture detection ● Better separators ● Better page layout reconstruction  Only a random set of old newspapers available After IMPACT:  IMPACT Segmentation Ground Truth available  New (internal) DA model for historic newspapers  New segmentation evaluation methodology  Evaluation results on newspapers ● 40% less split/merge errors ● 25% less garbage and lost textABBYY & OCR for IMPACT 24
  25. 25. Step 3: Text/Character RecognitionABBYY & OCR for IMPACT 25
  26. 26. Step 3: Text/Character Recognition Samples for Classifiers used in ABBYY technologies After line detection, character recognition is applied with different classifiers Raster classifier Contour classifier Structure classifier Feature differentiating classifierABBYY & OCR for IMPACT 26
  27. 27. Step 3: Text/Character RecognitionOptimization and new Developments Improved Gothic Classifiers  A significant amount of time was invested in gothic classifier training  The library selection of ground truth material (historical relevance) was used  New gothic graphemes were added Results  Good quality images: 2.8% (total) error rate on the used test set which is about 20% improvement to the “state of art” (V9) = almost comparable to modern documents  Bad quality Images: 7% (total) error rate on the used test set which is about 30% improvement to the “state of art” (V9)  Most of the improvements available in ABBYY current products: ABBYY FineReader Engine 10 (SDK) & Recognition Server 3.0 Quality optimization will be continued in future releases and technology cycles optimizedABBYY & OCR for IMPACT 27
  28. 28. Step 3: Text/Character RecognitionOptimization and new Developments Old Slavonic as new OCR Language New Development Before NowABBYY & OCR for IMPACT 28
  29. 29. Quality-Test-Comparison: Binarisation & Recognition ImprovementsABBYY & OCR for IMPACT 29
  30. 30. Binarisation & Recognition Improvements How to evaluate the recognition improvements of binarisation?  Binarisation & recognition quality go hand in hand! -> # Errors = 100% with V9 binarisation & V9 recognition -> # Errors = -5% with V9 binarisation & V10 recognition -> # Errors = -11% with V10 binarisation & V9 recognition -> # Errors = -15% with V10 binarisation & V10 recognition Binarisation Recognition TechnologyABBYY & OCR for IMPACT 30
  31. 31. Step 3-5: Dictionaries & ExportABBYY & OCR for IMPACT 31
  32. 32. Step 3 – 5: Other Optimizations External Dictionary API Tuning  External Dictionary API was available in the FineReader Engine (SDK)  Support for any language, any time period  API was/is heavily used from IMPACT language partners to run quality tests New ALTO XML Export Formats  FineReader Engine 10 R2, December 2010  Recognition Server 3.0, July 2011ABBYY & OCR for IMPACT 32
  33. 33. Additional NotesABBYY & OCR for IMPACT 33
  34. 34. Further Information & Trial Versions The ABBYY Gothic/Fraktur OCR Portal: www.frakturschrift.comABBYY & OCR for IMPACT 34
  35. 35. What IMPACT taught ABBYY aboutLibraries & Mass Digitalization projects… The Reality  Masses of books/document are available & already scanned  It is unclear if Antiqua and/or Gothic/Fraktur fonts are used in the documents  Pre-Sorting is impossible, it would be too time/cost expensive ABBYY Europes Answer Reduced the pricing for mixed “Old” + “Modern” font OCR projects The pricing is now ready for “mass processing” Examples Recognition Server 3.0 with “Gothic” enabled  10.000 pages – 299 Euro – available online  500.000 pages* – 5.000 Euro = 1 Euro cent per page = ca 2.000 books a 250 pages  Over 3 Mio pages* - ca 0,52 Euro cent per page = 12.000 books a 1,25 € (250 pages)  Over 10 Mio pages* - ca. 40.000 books = ca. 0,5 € per book ... No more excuses for not A4, bigger formats are counted as multiple pages 35ABBYY & OCR for IMPACT * page size is OCRing :o)
  36. 36. Pre-AnnouncementABBYY Online OCR Services with Gothic/Fraktur The ABBYY Gothic/Fraktur OCR Portal: finereader.abbyyonline.com  Historic OCR added just last week  Web GUI to upload documents and get results  Simple to use  Low Volume, ad hoc Usage  Instant results, quality evaluation  Pay as you go ABBYY Online OCR SDK  OCR Service with API and XML Output  Runs on Windows Azure  Currently Closed Beta Test  Public Beta Test Q1/2012ABBYY & OCR for IMPACT 36
  37. 37. SummaryABBYY & OCR for IMPACT 37
  38. 38. The whole is greater than the sum of its parts (Aristotle)ABBYY & OCR for IMPACT 38
  39. 39. Thank you for your attention! Questions? Michael Fuchs Senior Product Marketing Manager ABBYY Europe fuchs@abbyy.comABBYY & OCR for IMPACT 39

×