Document Recognition Technologies

5,016 views

Published on

From the leading expert in document automation technologies Chris Riley learn the basics of what document recognition is and how it works, as well as some best practices.

Published in: Technology, Business
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,016
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
0
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide

Document Recognition Technologies

  1. 1. Document Recognition a technology overview Presented by: Chris Riley, ecm P , ioa P President AIIM Golden Gate
  2. 2. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  3. 3. Why Chris? <ul><li>What qualifies Chris to talk to me? </li></ul><ul><ul><li>When a developer turns to sales </li></ul></ul><ul><ul><li>Leading expert in document automation technologies </li></ul></ul>
  4. 4. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  5. 5. Who knows what OCR is?
  6. 6. The Technologies <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones created for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  7. 7. The Technologies: OCR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Ship To:
  8. 8. The Technologies: ICR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Ilya
  9. 9. The Technologies: OMR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Card Account
  10. 10. The Technologies: Barcode <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>1889094476620
  11. 11. The Technologies: Handwriting <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>* Critical *
  12. 12. The Technologies: Acronym Heaven <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  13. 13. The Technologies: CAR/LAR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>2 hundred dollars & no cents
  14. 14. The Technologies: Assisted Capture <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  15. 15. The Technologies: Fixed Form Processing <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Name: Ilya Date: 12/21/2982
  16. 16. The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
  17. 17. 80% of business end-user documents are semi-structured
  18. 18. The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04
  19. 19. Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04 (06/09/2004) The Technologies: Semi-Structured Forms
  20. 20. The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Consignee Consignor Date Term
  21. 21. The Technologies: Common Processes <ul><li>Full page conversion </li></ul><ul><li>Classification </li></ul><ul><li>Index level extraction </li></ul><ul><li>Redaction </li></ul><ul><li>Routing </li></ul><ul><li>Auto Filing </li></ul><ul><li>Re-Purposing </li></ul><ul><li>Image Rotation </li></ul>
  22. 22. The Technologies: Full page conversion <ul><li>Image file to electronic data file </li></ul><ul><li>ALL text on the page </li></ul><ul><li>Includes: </li></ul><ul><ul><li>Image Pre-processing </li></ul></ul><ul><ul><li>Document Analysis/Zoning </li></ul></ul><ul><ul><li>Extraction </li></ul></ul><ul><ul><li>Export ( Commonly PDF, DOC ) </li></ul></ul>
  23. 23. The Technologies: Classification <ul><li>Software tells you the document type </li></ul><ul><li>Scan batches of mixed documents </li></ul>Bill of Lading Invoice Check PO
  24. 24. The Technologies: Index Level Extraction <ul><li>Just certain required fields extracted </li></ul><ul><li>Normalization of data </li></ul><ul><li>Export usually to a database </li></ul>Invoice Number Invoice Date Total Amt Due Term
  25. 25. The Technologies: How Accurate <ul><li>Better question is how do you determine accuracy </li></ul><ul><li>Document Type Accuracy </li></ul><ul><li>Field/Zone Location Accuracy </li></ul><ul><li>Data Type Accuracy </li></ul><ul><li>Character Accuracy </li></ul>
  26. 26. The Technologies: Common usage scenarios <ul><li>Document Conversion </li></ul><ul><li>Document Archival / Retrieval </li></ul><ul><li>Invoice Processing </li></ul><ul><li>Insurance Processing( medical, mortgage ) </li></ul><ul><li>Waybill processing </li></ul><ul><li>Survey processing </li></ul>
  27. 27. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  28. 28. There Really are only 4 core technology providers It takes 50 man-years to develop OCR using current computing abilities
  29. 29. Who Makes Them: Core Engines <ul><li>ABBYY </li></ul><ul><li>Nuance ( formally ScanSoft ) </li></ul><ul><li>ReadI.R.I.S </li></ul><ul><li>Oc é </li></ul><ul><li>CharacTell </li></ul><ul><li>ParaScript </li></ul><ul><li>A2iA </li></ul><ul><li>Handful of Open Source </li></ul><ul><li>Handful of Other Vendors </li></ul><ul><li>Two handfuls of OLD engines </li></ul>
  30. 30. Who Makes Them: Who Licenses Them <ul><li>EVERYONE ELSE! </li></ul><ul><li>AnaComp </li></ul><ul><li>Anydoc </li></ul><ul><li>BancTec </li></ul><ul><li>BrainWare </li></ul><ul><li>Captaris </li></ul><ul><li>Captivation </li></ul><ul><li>Cardiff </li></ul><ul><li>CVision </li></ul><ul><li>DataCap </li></ul><ul><li>DigiTech </li></ul><ul><li>eCopy </li></ul><ul><li>EMC Documentum </li></ul><ul><li>Kofax </li></ul><ul><li>LaserFiche </li></ul><ul><li>LeadTools </li></ul><ul><li>Microsoft </li></ul><ul><li>NSi AutoStore </li></ul><ul><li>OnBase </li></ul><ul><li>Perceptive Imaging </li></ul><ul><li>ReadSoft </li></ul><ul><li>SER </li></ul><ul><li>Top Image Systems </li></ul><ul><li>Tower </li></ul><ul><li>Westbrook </li></ul><ul><li>Xerox </li></ul><ul><li>Hundreds More </li></ul>
  31. 31. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  32. 32. 30% of organizations that purchase, purchase the wrong thing Over 50 % of organizations that purchase never use it properly
  33. 33. Buyer Beware <ul><li>If OCR is the reason for buying a solution know what Engine it is! </li></ul><ul><li>Talk about the WHOLE solution not the pieces </li></ul><ul><li>Get past marketing gimmicks </li></ul><ul><li>Trust, Love, Be Certain of your reseller / vendor </li></ul>
  34. 34. Buyer Beware: Know your engine <ul><li>What version? </li></ul><ul><li>Will they upgrade? </li></ul>
  35. 35. Buyer Beware: Talk about Whole Solution <ul><li>Scanner / Input </li></ul><ul><li>Capture </li></ul><ul><li>Storage </li></ul><ul><li>Have Requirements List Before </li></ul>
  36. 36. Buyer Beware: Get past Gimmicks <ul><li>NOTHING! Is 100% </li></ul><ul><li>All canned demos work perfect </li></ul><ul><li>Always see test on your documents </li></ul><ul><li>Version numbers are really arbitrary </li></ul>
  37. 37. Buyer Beware: Trust your vendor / reseller <ul><li>Support after sale ( test them ) </li></ul><ul><li>Where to get professional services </li></ul><ul><li>Do they understand the solution and not just the pieces? </li></ul>
  38. 38. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  39. 39. The Future <ul><li>Full-page OCR will be a commodity </li></ul><ul><li>Advance Document Processing will become main-stream but less required </li></ul><ul><li>Think about what to do now that you will be gathering data rapidly </li></ul><ul><li>There will be a new approach to OCR </li></ul>
  40. 40. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  41. 41. Questions and Answers <ul><li>Before you ask </li></ul>
  42. 42. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  43. 43. Free Stuff <ul><li>Copy of ABBYY FineReader Pro 9.0 </li></ul><ul><li>Copy of Nuance OmniPage 16 </li></ul><ul><li>Copy of ReadI.R.I.S Pro 11 </li></ul><ul><li>4 Hour Consulting Session with ME! </li></ul>

×