Document Recognition Market Landscape

4,940 views

Published on

History and overview of the document automation and recognition technology vendor landscape.

Published in: Technology, Business

Document Recognition Market Landscape

  1. 1. Document Recognition a technology overview Presented by: Chris Riley
  2. 2. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>News Flash </li></ul><ul><li>The Future </li></ul><ul><li>Q & A </li></ul>
  3. 3. Why Chris? <ul><li>Professional Experience </li></ul><ul><ul><li>LivingAnalytics, Inc. </li></ul></ul><ul><ul><li>Artsyl Technologies, Inc. </li></ul></ul><ul><ul><li>Visioneer, Inc. </li></ul></ul><ul><ul><li>ABBYY </li></ul></ul><ul><ul><li>IntelliKey Solutions, Inc. </li></ul></ul><ul><ul><li>Regis University </li></ul></ul><ul><ul><ul><li>Deep Study in Genetic Algorithms and Real-Time Analytics </li></ul></ul></ul><ul><li>What qualifies Chris to talk to me? </li></ul><ul><ul><li>Subject Matter Expert for: AIIM, TAWPI, DIR, Business Solution Mag. </li></ul></ul><ul><ul><li>Obtained Distinguished Services Award for Market Education </li></ul></ul><ul><ul><li>When a developer turns to sales and marketing </li></ul></ul>
  4. 4. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>News Flash </li></ul><ul><li>The Future </li></ul><ul><li>Q & A </li></ul>
  5. 5. The Technologies <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  6. 6. The Technologies: OCR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Ship To:
  7. 7. The Technologies: ICR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Ilya
  8. 8. The Technologies: OMR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Card Account
  9. 9. The Technologies: IDR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Check Invoice Bill of Lading EOB
  10. 10. The Technologies: Barcode <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>1889094476620
  11. 11. The Technologies: Handwriting <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>* Critical *
  12. 12. The Technologies: Acronym Heaven <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  13. 13. The Technologies: CAR/LAR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>2 hundred dollars & no cents
  14. 14. The Technologies: Assisted Capture <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  15. 15. The Technologies: Fixed Form Processing <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Name: Ilya Date: 12/21/2982
  16. 16. The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
  17. 17. The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing – Complexity is Underestimated </li></ul><ul><li>Unstructured Document Processing </li></ul>Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04
  18. 18. Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04 (06/09/2004) The Technologies: Semi-Structured Forms Note, many people confuse these documents as fixed
  19. 19. The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>IDR – Intelligent Document Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Consignee Consignor Date Term
  20. 20. The Technologies: Common Processes <ul><li>Full page conversion </li></ul><ul><li>Classification </li></ul><ul><li>Index level extraction </li></ul><ul><li>Redaction </li></ul><ul><li>Routing </li></ul><ul><li>Auto Filing </li></ul><ul><li>Re-Purposing </li></ul><ul><li>Image Rotation </li></ul>
  21. 21. The Technologies: Full page conversion <ul><li>Image file to electronic data file </li></ul><ul><li>ALL text on the page </li></ul><ul><li>Includes: </li></ul><ul><ul><li>Image Pre-processing </li></ul></ul><ul><ul><li>Document Analysis/Zoning </li></ul></ul><ul><ul><li>Extraction </li></ul></ul><ul><ul><li>Export ( Commonly PDF, DOC ) </li></ul></ul>
  22. 22. The Technologies: Classification <ul><li>Software tells you the document type </li></ul><ul><li>Several Modes of document classification </li></ul><ul><ul><li>Image Based </li></ul></ul><ul><ul><li>Contextual </li></ul></ul><ul><li>Scan batches of mixed documents </li></ul>Bill of Lading Invoice Check PO
  23. 23. The Technologies: Index Level Extraction <ul><li>Just certain required fields extracted </li></ul><ul><li>Normalization of data </li></ul><ul><li>Export usually to a database </li></ul>Invoice Number Invoice Date Total Amt Due Term
  24. 24. The Technologies: How Accurate <ul><li>Better question is how do you determine accuracy </li></ul><ul><li>Document Type Accuracy </li></ul><ul><li>Field/Zone Location Accuracy </li></ul><ul><li>Data Type Accuracy </li></ul><ul><li>Character Accuracy </li></ul>
  25. 25. The Technologies: Document Complexities <ul><li>By Data Capture Complexity – Hardest to Easiest </li></ul><ul><li>EOB – Marginal success </li></ul><ul><ul><li>BancTec </li></ul></ul><ul><ul><li>HOV </li></ul></ul><ul><ul><li>ECS </li></ul></ul><ul><li>Student Transcriptions – no success, no money </li></ul><ul><li>Invoices ( not a vertical ) </li></ul><ul><ul><li>Telecom Bills </li></ul></ul><ul><ul><li>Legal Invoices </li></ul></ul><ul><ul><li>Aggregate Invoices </li></ul></ul><ul><ul><li>Traditional Invoice </li></ul></ul><ul><li>Checks </li></ul><ul><li>Bill of Lading </li></ul><ul><li>Prescriptions </li></ul><ul><li>and Transportation Documents </li></ul><ul><li>HCFA UB </li></ul><ul><li>Fax Cover Sheets </li></ul><ul><li>Other typographic </li></ul><ul><li>Fixed Forms </li></ul>
  26. 26. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>News Flash </li></ul><ul><li>The Future </li></ul><ul><li>Q & A </li></ul>
  27. 27. There Really are only 4 core technology providers It takes 50 man-years to develop OCR using traditional approach
  28. 28. Who Makes Them: Core Engines <ul><li>Traditional OCR Approach ( sorted by market share ) – All European Engines </li></ul><ul><li>Nuance ( formally ScanSoft ) derivative of Care Engine </li></ul><ul><ul><li>Middle of the road cost, accuracy, speed </li></ul></ul><ul><li>ABBYY </li></ul><ul><ul><li>Most accurate, slowest, most expensive </li></ul></ul><ul><li>Oc é </li></ul><ul><ul><li>Very fast, moderately expensive </li></ul></ul><ul><li>ReadI.R.I.S </li></ul><ul><ul><li>Fastest, not very accurate </li></ul></ul><ul><li>Specialized Engines </li></ul><ul><li>CharacTell </li></ul><ul><li>ParaScript </li></ul><ul><li>A2iA </li></ul><ul><li>Mitek </li></ul><ul><li>None Traditional Approach </li></ul><ul><li>NovoDynamics </li></ul><ul><li>TIS </li></ul><ul><li>Paledon </li></ul><ul><li>Other </li></ul><ul><li>Handful of Open Source, Tesseract, Octopus </li></ul><ul><li>Two handfuls of OLD engines, Expervision, Care </li></ul>
  29. 29. Who Makes Them: History <ul><li>Ray Kurzweil father of OCR – 1974 </li></ul><ul><ul><li>Arguably In University for some time </li></ul></ul><ul><li>Caere Founded - 1976 </li></ul><ul><li>Ray sells his Engine to Xerox PAC becomes TextBridge – 1978 </li></ul><ul><li>ReadI.R.I.S formed by Belgium grant – 1981 </li></ul><ul><li>Tesseract Created by HP research – 1985 </li></ul><ul><li>Expervision Founded - 1987 </li></ul><ul><li>UNLV becomes standards organization for OCR </li></ul><ul><li>ABBY Founded - 1989 </li></ul><ul><ul><li>Russian MIT Equivalent MIPT Moscow Institute of Physics and Technology </li></ul></ul><ul><li>Luc Vincent Invents Document Analysis ( now at Google, lead on Tesseract project ) - 1994 </li></ul><ul><li>ScanSoft Splits from Xerox 1998 </li></ul><ul><li>ScanSoft acquires Caere - 2000 </li></ul><ul><li>ScanSoft Becomes Nuance – 2005 </li></ul><ul><ul><li>OCR Business Takes Backseat </li></ul></ul>
  30. 30. Who Makes Them: Who Licenses Them <ul><li>EVERYONE ELSE! </li></ul><ul><li>AnaComp </li></ul><ul><li>Anydoc </li></ul><ul><li>BancTec </li></ul><ul><li>BrainWare </li></ul><ul><li>Captaris </li></ul><ul><li>Captivation </li></ul><ul><li>Cardiff </li></ul><ul><li>CVision </li></ul><ul><li>DataCap </li></ul><ul><li>DigiTech </li></ul><ul><li>eCopy </li></ul><ul><li>EMC Documentum </li></ul><ul><li>Kofax </li></ul><ul><li>LaserFiche </li></ul><ul><li>LeadTools </li></ul><ul><li>Microsoft </li></ul><ul><li>NSi AutoStore </li></ul><ul><li>OnBase </li></ul><ul><li>Perceptive Imaging </li></ul><ul><li>ReadSoft </li></ul><ul><li>SER </li></ul><ul><li>Top Image Systems </li></ul><ul><li>Tower </li></ul><ul><li>Westbrook </li></ul><ul><li>Xerox </li></ul><ul><li>Etc. </li></ul>
  31. 31. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>News Flash </li></ul><ul><li>The Future </li></ul><ul><li>Q & A </li></ul>
  32. 32. News Flash <ul><li>Purchase Consolidation </li></ul><ul><ul><li>OpenText bought Captaris who bought Oce </li></ul></ul><ul><li>Legal </li></ul><ul><ul><li>Nuance sues ABBYY over core OCR algorithms </li></ul></ul><ul><ul><ul><li>If they win only one OCR engine only option is new approaches </li></ul></ul></ul>
  33. 33. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>News Flash </li></ul><ul><li>The Future </li></ul><ul><li>Q & A </li></ul>
  34. 34. The Future <ul><li>More lawsuits </li></ul><ul><li>More consolidation </li></ul><ul><li>Full-page OCR will be a commodity </li></ul><ul><li>Advance Document Processing will become main-stream but less required </li></ul><ul><li>Document classification will be next big area of research and product solutions, the new big ticket item </li></ul><ul><li>There will be a new approach to OCR </li></ul><ul><li>Think about what to do now that you will be gathering data rapidly </li></ul>
  35. 35. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>News Flash </li></ul><ul><li>The Future </li></ul><ul><li>Q & A </li></ul>
  36. 36. Questions and Answers

×