Document Recognition
       a technology overview

Presented by:
Chris Riley of Artsyl Technologies, Inc.
But First
 Your new AIIM Board!

 Exciting new events
    Golf
    Networking
    More Education Sessions
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
Why Chris?
 Who is Artsyl?

 What qualifies Chris to talk to me?

   When a developer turns to sales
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
Who knows what OCR is?
The Technologies
 OCR – Optical Character Recognition
 ICR – Intelligent Character Recognition
 OMR – Optical Mark Recogni...
The Technologies: OCR
 OCR – Optical Character Recognition
                                                               ...
The Technologies: ICR
 OCR – Optical Character Recognition
                                                               ...
The Technologies: OMR
 OCR – Optical Character Recognition
 ICR – Intelligent Character Recognition                       ...
The Technologies: Barcode
 OCR – Optical Character Recognition
 ICR – Intelligent Character Recognition                   ...
The Technologies: Handwriting
 OCR – Optical Character Recognition
                                                       ...
The Technologies: Acronym Heaven
 OCR – Optical Character Recognition
 ICR – Intelligent Character Recognition
 OMR – Opti...
The Technologies: CAR/LAR
 OCR – Optical Character Recognition
 ICR – Intelligent Character Recognition
 OMR – Optical Mar...
The Technologies: Assisted Capture
 OCR – Optical Character Recognition
 ICR – Intelligent Character Recognition
 OMR – Op...
The Technologies: Fixed Form Processing

 OCR – Optical Character Recognition                           Name: Ilya
 ICR – ...
The Technologies: Fixed Form Processing




                         Name: Ilya
                         Date: 12/21/2982
80% of business end-user documents
        are semi-structured
The Technologies: Semi-Structured Forms
                                                               Invoice No: 99044
 ...
The Technologies: Semi-Structured Forms




                             Invoice No: 99044
                             Da...
The Technologies: Semi-Structured Forms
                                                               Consignee
 OCR – Op...
The Technologies: Common Processes

 Full page conversion
 Classification
 Index level extraction

 Redaction
 Routing
 Au...
The Technologies: Full page conversion

 Image file to electronic data file
 ALL text on the page
 Includes:
   Image Pre-...
The Technologies: Classification

 Software tells you the document type
 Scan batches of mixed documents


               ...
The Technologies: Index Level Extraction

 Just certain required fields extracted
 Normalization of data
 Export usually t...
The Technologies: How Accurate

 Better question is how do you determine
 accuracy

 Document Type Accuracy
 Field/Zone Lo...
The Technologies: Common usage scenarios

 Document Conversion

 Document Archival / Retrieval

 Invoice Processing

 Insu...
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
There Really are only 3 core
       technology providers

It takes 50 man-years to develop OCR
    using current computing...
Who Makes Them: Core Engines
 ABBYY
 Nuance ( formally ScanSoft )
 ReadI.R.I.S

 Océ
 CharacTell
 ParaScript
 A2iA

 Handf...
Who Makes Them: Who Licenses Them
EVERYONE ELSE!
AnaComp
Anydoc
BancTec
BrainWare
Captaris
Captivation
Cardiff
CVision
Dat...
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
30% of organizations that purchase,
    purchase the wrong thing

  Over 50 % of organizations that
  purchase never use i...
Buyer Beware
 If OCR is the reason for buying a solution know
 what Engine it is!

 Talk about the WHOLE solution not the ...
Buyer Beware: Know your engine

 What version?
 Will they upgrade?
Buyer Beware: Talk about Whole Solution

 Scanner / Input
 Capture
 Storage

 Have Requirements List Before
Buyer Beware: Get past Gimmicks

 NOTHING! Is 100%

 All canned demos work perfect

 Always see test on your documents

 V...
Buyer Beware: Trust your vendor / reseller

 Support after sale ( test them )

 Where to get professional services

 Do th...
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
The Future
 Full-page OCR will be a commodity

 Advance Document Processing will become main-
 stream but less required


...
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
Questions and Answers
 Before you ask
What we will cover:
 Why Chris?

 What Are the Document Recognition Technologies

 Who Makes Them

 Buyer Beware

 The fut...
Free Stuff
 Copy of ABBYY FineReader Pro 9.0
 Copy of Nuance OmniPage 16
 Copy of ReadI.R.I.S Pro 11

 4 Hour Consulting S...
Upcoming SlideShare
Loading in...5
×

December 2007 Document Recognition Technology Overview Presentation

932

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
932
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
81
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

December 2007 Document Recognition Technology Overview Presentation

  1. 1. Document Recognition a technology overview Presented by: Chris Riley of Artsyl Technologies, Inc.
  2. 2. But First Your new AIIM Board! Exciting new events Golf Networking More Education Sessions
  3. 3. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  4. 4. Why Chris? Who is Artsyl? What qualifies Chris to talk to me? When a developer turns to sales
  5. 5. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  6. 6. Who knows what OCR is?
  7. 7. The Technologies OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  8. 8. The Technologies: OCR OCR – Optical Character Recognition Ship To: ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  9. 9. The Technologies: ICR OCR – Optical Character Recognition Ilya ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  10. 10. The Technologies: OMR OCR – Optical Character Recognition ICR – Intelligent Character Recognition Card Account OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  11. 11. The Technologies: Barcode OCR – Optical Character Recognition ICR – Intelligent Character Recognition 1889094476620 OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  12. 12. The Technologies: Handwriting OCR – Optical Character Recognition * Critical * ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  13. 13. The Technologies: Acronym Heaven OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  14. 14. The Technologies: CAR/LAR OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition 2 hundred dollars & no cents Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  15. 15. The Technologies: Assisted Capture OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  16. 16. The Technologies: Fixed Form Processing OCR – Optical Character Recognition Name: Ilya ICR – Intelligent Character Recognition Date: 12/21/2982 OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  17. 17. The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
  18. 18. 80% of business end-user documents are semi-structured
  19. 19. The Technologies: Semi-Structured Forms Invoice No: 99044 OCR – Optical Character Recognition Date: 06/09/04 ICR – Intelligent Character Recognition Invoice No: 24567 OMR – Optical Mark Recognition Date: 06/09/04 Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  20. 20. The Technologies: Semi-Structured Forms Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04 (06/09/2004)
  21. 21. The Technologies: Semi-Structured Forms Consignee OCR – Optical Character Recognition Consignor ICR – Intelligent Character Recognition Date OMR – Optical Mark Recognition Term Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  22. 22. The Technologies: Common Processes Full page conversion Classification Index level extraction Redaction Routing Auto Filing Re-Purposing Image Rotation
  23. 23. The Technologies: Full page conversion Image file to electronic data file ALL text on the page Includes: Image Pre-processing Document Analysis/Zoning Extraction Export ( Commonly PDF, DOC )
  24. 24. The Technologies: Classification Software tells you the document type Scan batches of mixed documents ng ce oi di a v In fL k lo ec l Bi Ch PO
  25. 25. The Technologies: Index Level Extraction Just certain required fields extracted Normalization of data Export usually to a database Invoice Number Invoice Date Total Amt Due Term
  26. 26. The Technologies: How Accurate Better question is how do you determine accuracy Document Type Accuracy Field/Zone Location Accuracy Data Type Accuracy Character Accuracy
  27. 27. The Technologies: Common usage scenarios Document Conversion Document Archival / Retrieval Invoice Processing Insurance Processing( medical, mortgage ) Waybill processing Survey processing
  28. 28. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  29. 29. There Really are only 3 core technology providers It takes 50 man-years to develop OCR using current computing abilities
  30. 30. Who Makes Them: Core Engines ABBYY Nuance ( formally ScanSoft ) ReadI.R.I.S Océ CharacTell ParaScript A2iA Handful of Open Source Handful of Other Vendors Two handfuls of OLD engines
  31. 31. Who Makes Them: Who Licenses Them EVERYONE ELSE! AnaComp Anydoc BancTec BrainWare Captaris Captivation Cardiff CVision DataCap DigiTech eCopy EMC Documentum Kofax LaserFiche LeadTools Microsoft NSi AutoStore OnBase Perceptive Imaging ReadSoft SER Top Image Systems Tower Westbrook Xerox Hundreds More
  32. 32. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  33. 33. 30% of organizations that purchase, purchase the wrong thing Over 50 % of organizations that purchase never use it properly
  34. 34. Buyer Beware If OCR is the reason for buying a solution know what Engine it is! Talk about the WHOLE solution not the pieces Get past marketing gimmicks Trust, Love, Be Certain of your reseller / vendor
  35. 35. Buyer Beware: Know your engine What version? Will they upgrade?
  36. 36. Buyer Beware: Talk about Whole Solution Scanner / Input Capture Storage Have Requirements List Before
  37. 37. Buyer Beware: Get past Gimmicks NOTHING! Is 100% All canned demos work perfect Always see test on your documents Version numbers are really arbitrary
  38. 38. Buyer Beware: Trust your vendor / reseller Support after sale ( test them ) Where to get professional services Do they understand the solution and not just the pieces?
  39. 39. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  40. 40. The Future Full-page OCR will be a commodity Advance Document Processing will become main- stream but less required Think about what to do now that you will be gathering data rapidly There will be a new approach to OCR
  41. 41. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  42. 42. Questions and Answers Before you ask
  43. 43. What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q&A Free Stuff!
  44. 44. Free Stuff Copy of ABBYY FineReader Pro 9.0 Copy of Nuance OmniPage 16 Copy of ReadI.R.I.S Pro 11 4 Hour Consulting Session with ME!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×