Document Recognition a technology overview Presented by:  Chris Riley,  ecm P ,   ioa P President AIIM Golden Gate
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
Why Chris? <ul><li>What qualifies Chris to talk to me? </li></ul><ul><ul><li>When a developer turns to sales </li></ul></u...
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
Who knows what OCR is?
The Technologies <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </l...
The Technologies: OCR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognitio...
The Technologies: ICR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognitio...
The Technologies: OMR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognitio...
The Technologies: Barcode <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recogn...
The Technologies: Handwriting <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Re...
The Technologies: Acronym Heaven <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character...
The Technologies: CAR/LAR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recogn...
The Technologies: Assisted Capture <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Charact...
The Technologies: Fixed Form Processing <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Ch...
The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
80% of business end-user documents are semi-structured
The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Ch...
Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04  (06/09/2004) The Technologies: Semi-Structured Forms
The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Ch...
The Technologies: Common Processes <ul><li>Full page conversion </li></ul><ul><li>Classification </li></ul><ul><li>Index l...
The Technologies: Full page conversion <ul><li>Image file to electronic data file </li></ul><ul><li>ALL text on the page <...
The Technologies: Classification <ul><li>Software tells you the document type </li></ul><ul><li>Scan batches of mixed docu...
The Technologies: Index Level Extraction <ul><li>Just certain required fields extracted </li></ul><ul><li>Normalization of...
The Technologies: How Accurate <ul><li>Better question is how do you determine accuracy </li></ul><ul><li>Document Type Ac...
The Technologies: Common usage scenarios <ul><li>Document Conversion </li></ul><ul><li>Document Archival / Retrieval </li>...
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
There Really are only 4 core technology providers It takes 50 man-years to develop OCR using current computing abilities
Who Makes Them:  Core Engines <ul><li>ABBYY </li></ul><ul><li>Nuance ( formally ScanSoft ) </li></ul><ul><li>ReadI.R.I.S <...
Who Makes Them:  Who Licenses Them <ul><li>EVERYONE ELSE! </li></ul><ul><li>AnaComp </li></ul><ul><li>Anydoc </li></ul><ul...
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
30% of organizations that purchase, purchase the wrong thing Over 50 % of organizations that purchase never use it properly
Buyer Beware <ul><li>If OCR is the reason for buying a solution know what Engine it is! </li></ul><ul><li>Talk about the W...
Buyer Beware: Know your engine <ul><li>What version? </li></ul><ul><li>Will they upgrade? </li></ul>
Buyer Beware: Talk about Whole Solution <ul><li>Scanner / Input </li></ul><ul><li>Capture </li></ul><ul><li>Storage </li><...
Buyer Beware: Get past Gimmicks <ul><li>NOTHING! Is 100% </li></ul><ul><li>All canned demos work perfect </li></ul><ul><li...
Buyer Beware: Trust your vendor / reseller <ul><li>Support after sale ( test them ) </li></ul><ul><li>Where to get profess...
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
The Future <ul><li>Full-page OCR will be a commodity </li></ul><ul><li>Advance Document Processing will become main-stream...
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
Questions and Answers <ul><li>Before you ask </li></ul>
What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>...
Free Stuff <ul><li>Copy of ABBYY FineReader Pro 9.0 </li></ul><ul><li>Copy of Nuance OmniPage 16 </li></ul><ul><li>Copy of...
Upcoming SlideShare
Loading in...5
×

Document Recognition Technologies

4,321

Published on

From the leading expert in document automation technologies Chris Riley learn the basics of what document recognition is and how it works, as well as some best practices.

Published in: Technology, Business
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,321
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Document Recognition Technologies

  1. 1. Document Recognition a technology overview Presented by: Chris Riley, ecm P , ioa P President AIIM Golden Gate
  2. 2. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  3. 3. Why Chris? <ul><li>What qualifies Chris to talk to me? </li></ul><ul><ul><li>When a developer turns to sales </li></ul></ul><ul><ul><li>Leading expert in document automation technologies </li></ul></ul>
  4. 4. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  5. 5. Who knows what OCR is?
  6. 6. The Technologies <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones created for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  7. 7. The Technologies: OCR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Ship To:
  8. 8. The Technologies: ICR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Ilya
  9. 9. The Technologies: OMR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Card Account
  10. 10. The Technologies: Barcode <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>1889094476620
  11. 11. The Technologies: Handwriting <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>* Critical *
  12. 12. The Technologies: Acronym Heaven <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  13. 13. The Technologies: CAR/LAR <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>2 hundred dollars & no cents
  14. 14. The Technologies: Assisted Capture <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>
  15. 15. The Technologies: Fixed Form Processing <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Name: Ilya Date: 12/21/2982
  16. 16. The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
  17. 17. 80% of business end-user documents are semi-structured
  18. 18. The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04
  19. 19. Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04 (06/09/2004) The Technologies: Semi-Structured Forms
  20. 20. The Technologies: Semi-Structured Forms <ul><li>OCR – Optical Character Recognition </li></ul><ul><li>ICR – Intelligent Character Recognition </li></ul><ul><li>OMR – Optical Mark Recognition </li></ul><ul><li>Barcode </li></ul><ul><li>Handwriting </li></ul><ul><li>All the other ones made up for marketing purposes </li></ul><ul><li>CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition </li></ul><ul><li>Assisted Capture </li></ul><ul><li>Fixed Form Process </li></ul><ul><li>Semi-Structured Forms Processing </li></ul><ul><li>Unstructured Document Processing </li></ul>Consignee Consignor Date Term
  21. 21. The Technologies: Common Processes <ul><li>Full page conversion </li></ul><ul><li>Classification </li></ul><ul><li>Index level extraction </li></ul><ul><li>Redaction </li></ul><ul><li>Routing </li></ul><ul><li>Auto Filing </li></ul><ul><li>Re-Purposing </li></ul><ul><li>Image Rotation </li></ul>
  22. 22. The Technologies: Full page conversion <ul><li>Image file to electronic data file </li></ul><ul><li>ALL text on the page </li></ul><ul><li>Includes: </li></ul><ul><ul><li>Image Pre-processing </li></ul></ul><ul><ul><li>Document Analysis/Zoning </li></ul></ul><ul><ul><li>Extraction </li></ul></ul><ul><ul><li>Export ( Commonly PDF, DOC ) </li></ul></ul>
  23. 23. The Technologies: Classification <ul><li>Software tells you the document type </li></ul><ul><li>Scan batches of mixed documents </li></ul>Bill of Lading Invoice Check PO
  24. 24. The Technologies: Index Level Extraction <ul><li>Just certain required fields extracted </li></ul><ul><li>Normalization of data </li></ul><ul><li>Export usually to a database </li></ul>Invoice Number Invoice Date Total Amt Due Term
  25. 25. The Technologies: How Accurate <ul><li>Better question is how do you determine accuracy </li></ul><ul><li>Document Type Accuracy </li></ul><ul><li>Field/Zone Location Accuracy </li></ul><ul><li>Data Type Accuracy </li></ul><ul><li>Character Accuracy </li></ul>
  26. 26. The Technologies: Common usage scenarios <ul><li>Document Conversion </li></ul><ul><li>Document Archival / Retrieval </li></ul><ul><li>Invoice Processing </li></ul><ul><li>Insurance Processing( medical, mortgage ) </li></ul><ul><li>Waybill processing </li></ul><ul><li>Survey processing </li></ul>
  27. 27. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  28. 28. There Really are only 4 core technology providers It takes 50 man-years to develop OCR using current computing abilities
  29. 29. Who Makes Them: Core Engines <ul><li>ABBYY </li></ul><ul><li>Nuance ( formally ScanSoft ) </li></ul><ul><li>ReadI.R.I.S </li></ul><ul><li>Oc é </li></ul><ul><li>CharacTell </li></ul><ul><li>ParaScript </li></ul><ul><li>A2iA </li></ul><ul><li>Handful of Open Source </li></ul><ul><li>Handful of Other Vendors </li></ul><ul><li>Two handfuls of OLD engines </li></ul>
  30. 30. Who Makes Them: Who Licenses Them <ul><li>EVERYONE ELSE! </li></ul><ul><li>AnaComp </li></ul><ul><li>Anydoc </li></ul><ul><li>BancTec </li></ul><ul><li>BrainWare </li></ul><ul><li>Captaris </li></ul><ul><li>Captivation </li></ul><ul><li>Cardiff </li></ul><ul><li>CVision </li></ul><ul><li>DataCap </li></ul><ul><li>DigiTech </li></ul><ul><li>eCopy </li></ul><ul><li>EMC Documentum </li></ul><ul><li>Kofax </li></ul><ul><li>LaserFiche </li></ul><ul><li>LeadTools </li></ul><ul><li>Microsoft </li></ul><ul><li>NSi AutoStore </li></ul><ul><li>OnBase </li></ul><ul><li>Perceptive Imaging </li></ul><ul><li>ReadSoft </li></ul><ul><li>SER </li></ul><ul><li>Top Image Systems </li></ul><ul><li>Tower </li></ul><ul><li>Westbrook </li></ul><ul><li>Xerox </li></ul><ul><li>Hundreds More </li></ul>
  31. 31. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  32. 32. 30% of organizations that purchase, purchase the wrong thing Over 50 % of organizations that purchase never use it properly
  33. 33. Buyer Beware <ul><li>If OCR is the reason for buying a solution know what Engine it is! </li></ul><ul><li>Talk about the WHOLE solution not the pieces </li></ul><ul><li>Get past marketing gimmicks </li></ul><ul><li>Trust, Love, Be Certain of your reseller / vendor </li></ul>
  34. 34. Buyer Beware: Know your engine <ul><li>What version? </li></ul><ul><li>Will they upgrade? </li></ul>
  35. 35. Buyer Beware: Talk about Whole Solution <ul><li>Scanner / Input </li></ul><ul><li>Capture </li></ul><ul><li>Storage </li></ul><ul><li>Have Requirements List Before </li></ul>
  36. 36. Buyer Beware: Get past Gimmicks <ul><li>NOTHING! Is 100% </li></ul><ul><li>All canned demos work perfect </li></ul><ul><li>Always see test on your documents </li></ul><ul><li>Version numbers are really arbitrary </li></ul>
  37. 37. Buyer Beware: Trust your vendor / reseller <ul><li>Support after sale ( test them ) </li></ul><ul><li>Where to get professional services </li></ul><ul><li>Do they understand the solution and not just the pieces? </li></ul>
  38. 38. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  39. 39. The Future <ul><li>Full-page OCR will be a commodity </li></ul><ul><li>Advance Document Processing will become main-stream but less required </li></ul><ul><li>Think about what to do now that you will be gathering data rapidly </li></ul><ul><li>There will be a new approach to OCR </li></ul>
  40. 40. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  41. 41. Questions and Answers <ul><li>Before you ask </li></ul>
  42. 42. What we will cover: <ul><li>Why Chris? </li></ul><ul><li>What Are the Document Recognition Technologies </li></ul><ul><li>Who Makes Them </li></ul><ul><li>Buyer Beware </li></ul><ul><li>The future </li></ul><ul><li>Q & A </li></ul><ul><li>Free Stuff! </li></ul>
  43. 43. Free Stuff <ul><li>Copy of ABBYY FineReader Pro 9.0 </li></ul><ul><li>Copy of Nuance OmniPage 16 </li></ul><ul><li>Copy of ReadI.R.I.S Pro 11 </li></ul><ul><li>4 Hour Consulting Session with ME! </li></ul>

×