Document Recognition a technology overview Presented by:  Chris Riley,  ecm P ,   ioa P President AIIM Golden Gate
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
Why Chris? What qualifies Chris to talk to me? When a developer turns to sales Leading expert in document automation technologies
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
Who knows what OCR is?
The Technologies OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones created for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
The Technologies: OCR OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Ship To:
The Technologies: ICR OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Ilya
The Technologies: OMR OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Card Account
The Technologies: Barcode OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing 1889094476620
The Technologies: Handwriting OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing * Critical *
The Technologies: Acronym Heaven OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
The Technologies: CAR/LAR OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing 2 hundred dollars & no cents
The Technologies: Assisted Capture OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
The Technologies: Fixed Form Processing OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Name: Ilya Date: 12/21/2982
The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
80% of business end-user documents are semi-structured
The Technologies: Semi-Structured Forms OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04
Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04  (06/09/2004) The Technologies: Semi-Structured Forms
The Technologies: Semi-Structured Forms OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Consignee Consignor Date Term
The Technologies: Common Processes Full page conversion Classification Index level extraction Redaction Routing Auto Filing Re-Purposing Image Rotation
The Technologies: Full page conversion Image file to electronic data file ALL text on the page Includes: Image Pre-processing Document Analysis/Zoning Extraction Export ( Commonly PDF, DOC )
The Technologies: Classification Software tells you the document type Scan batches of mixed documents Bill of Lading Invoice Check PO
The Technologies: Index Level Extraction Just certain required fields extracted Normalization of data Export usually to a database Invoice Number Invoice Date Total Amt Due Term
The Technologies: How Accurate Better question is how do you determine accuracy Document Type Accuracy Field/Zone Location Accuracy Data Type Accuracy Character Accuracy
The Technologies: Common usage scenarios Document Conversion Document Archival / Retrieval Invoice Processing Insurance Processing( medical, mortgage ) Waybill processing Survey processing
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
There Really are only 4 core technology providers It takes 50 man-years to develop OCR using current computing abilities
Who Makes Them:  Core Engines ABBYY Nuance ( formally ScanSoft ) ReadI.R.I.S Oc é CharacTell ParaScript A2iA Handful of Open Source Handful of Other Vendors Two handfuls of OLD engines
Who Makes Them:  Who Licenses Them EVERYONE ELSE! AnaComp Anydoc BancTec BrainWare Captaris Captivation Cardiff CVision DataCap DigiTech eCopy EMC Documentum Kofax LaserFiche LeadTools Microsoft NSi AutoStore OnBase Perceptive Imaging ReadSoft SER Top Image Systems Tower Westbrook Xerox Hundreds More
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
30% of organizations that purchase, purchase the wrong thing Over 50 % of organizations that purchase never use it properly
Buyer Beware If OCR is the reason for buying a solution know what Engine it is! Talk about the WHOLE solution not the pieces Get past marketing gimmicks Trust, Love, Be Certain of your reseller / vendor
Buyer Beware: Know your engine What version? Will they upgrade?
Buyer Beware: Talk about Whole Solution Scanner / Input Capture Storage Have Requirements List Before
Buyer Beware: Get past Gimmicks NOTHING! Is 100% All canned demos work perfect Always see test on your documents Version numbers are really arbitrary
Buyer Beware: Trust your vendor / reseller Support after sale ( test them ) Where to get professional services Do they understand the solution and not just the pieces?
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
The Future Full-page OCR will be a commodity Advance Document Processing will become main-stream but less required Think about what to do now that you will be gathering data rapidly There will be a new approach to OCR
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
Questions and Answers Before you ask
What we will cover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
Free Stuff Copy of ABBYY FineReader Pro 9.0 Copy of Nuance OmniPage 16 Copy of ReadI.R.I.S Pro 11  4 Hour Consulting Session with ME!

Document Recognition Technologies

  • 1.
    Document Recognition atechnology overview Presented by: Chris Riley, ecm P , ioa P President AIIM Golden Gate
  • 2.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 3.
    Why Chris? Whatqualifies Chris to talk to me? When a developer turns to sales Leading expert in document automation technologies
  • 4.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 5.
  • 6.
    The Technologies OCR– Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones created for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  • 7.
    The Technologies: OCROCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Ship To:
  • 8.
    The Technologies: ICROCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Ilya
  • 9.
    The Technologies: OMROCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Card Account
  • 10.
    The Technologies: BarcodeOCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing 1889094476620
  • 11.
    The Technologies: HandwritingOCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing * Critical *
  • 12.
    The Technologies: AcronymHeaven OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  • 13.
    The Technologies: CAR/LAROCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing 2 hundred dollars & no cents
  • 14.
    The Technologies: AssistedCapture OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing
  • 15.
    The Technologies: FixedForm Processing OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Name: Ilya Date: 12/21/2982
  • 16.
    The Technologies: FixedForm Processing Name: Ilya Date: 12/21/2982
  • 17.
    80% of businessend-user documents are semi-structured
  • 18.
    The Technologies: Semi-StructuredForms OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04
  • 19.
    Invoice No: 99044Date: 06/09/04 Invoice No: 24567 Date: 06/09/04 (06/09/2004) The Technologies: Semi-Structured Forms
  • 20.
    The Technologies: Semi-StructuredForms OCR – Optical Character Recognition ICR – Intelligent Character Recognition OMR – Optical Mark Recognition Barcode Handwriting All the other ones made up for marketing purposes CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition Assisted Capture Fixed Form Process Semi-Structured Forms Processing Unstructured Document Processing Consignee Consignor Date Term
  • 21.
    The Technologies: CommonProcesses Full page conversion Classification Index level extraction Redaction Routing Auto Filing Re-Purposing Image Rotation
  • 22.
    The Technologies: Fullpage conversion Image file to electronic data file ALL text on the page Includes: Image Pre-processing Document Analysis/Zoning Extraction Export ( Commonly PDF, DOC )
  • 23.
    The Technologies: ClassificationSoftware tells you the document type Scan batches of mixed documents Bill of Lading Invoice Check PO
  • 24.
    The Technologies: IndexLevel Extraction Just certain required fields extracted Normalization of data Export usually to a database Invoice Number Invoice Date Total Amt Due Term
  • 25.
    The Technologies: HowAccurate Better question is how do you determine accuracy Document Type Accuracy Field/Zone Location Accuracy Data Type Accuracy Character Accuracy
  • 26.
    The Technologies: Commonusage scenarios Document Conversion Document Archival / Retrieval Invoice Processing Insurance Processing( medical, mortgage ) Waybill processing Survey processing
  • 27.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 28.
    There Really areonly 4 core technology providers It takes 50 man-years to develop OCR using current computing abilities
  • 29.
    Who Makes Them: Core Engines ABBYY Nuance ( formally ScanSoft ) ReadI.R.I.S Oc é CharacTell ParaScript A2iA Handful of Open Source Handful of Other Vendors Two handfuls of OLD engines
  • 30.
    Who Makes Them: Who Licenses Them EVERYONE ELSE! AnaComp Anydoc BancTec BrainWare Captaris Captivation Cardiff CVision DataCap DigiTech eCopy EMC Documentum Kofax LaserFiche LeadTools Microsoft NSi AutoStore OnBase Perceptive Imaging ReadSoft SER Top Image Systems Tower Westbrook Xerox Hundreds More
  • 31.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 32.
    30% of organizationsthat purchase, purchase the wrong thing Over 50 % of organizations that purchase never use it properly
  • 33.
    Buyer Beware IfOCR is the reason for buying a solution know what Engine it is! Talk about the WHOLE solution not the pieces Get past marketing gimmicks Trust, Love, Be Certain of your reseller / vendor
  • 34.
    Buyer Beware: Knowyour engine What version? Will they upgrade?
  • 35.
    Buyer Beware: Talkabout Whole Solution Scanner / Input Capture Storage Have Requirements List Before
  • 36.
    Buyer Beware: Getpast Gimmicks NOTHING! Is 100% All canned demos work perfect Always see test on your documents Version numbers are really arbitrary
  • 37.
    Buyer Beware: Trustyour vendor / reseller Support after sale ( test them ) Where to get professional services Do they understand the solution and not just the pieces?
  • 38.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 39.
    The Future Full-pageOCR will be a commodity Advance Document Processing will become main-stream but less required Think about what to do now that you will be gathering data rapidly There will be a new approach to OCR
  • 40.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 41.
    Questions and AnswersBefore you ask
  • 42.
    What we willcover: Why Chris? What Are the Document Recognition Technologies Who Makes Them Buyer Beware The future Q & A Free Stuff!
  • 43.
    Free Stuff Copyof ABBYY FineReader Pro 9.0 Copy of Nuance OmniPage 16 Copy of ReadI.R.I.S Pro 11 4 Hour Consulting Session with ME!