Your SlideShare is downloading. ×

Document Recognition Market Landscape

3,371

Published on

History and overview of the document automation and recognition technology vendor landscape.

History and overview of the document automation and recognition technology vendor landscape.

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,371
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Document Recognition a technology overview Presented by: Chris Riley
  • 2. What we will cover:
    • Why Chris?
    • What Are the Document Recognition Technologies
    • Who Makes Them
    • News Flash
    • The Future
    • Q & A
  • 3. Why Chris?
    • Professional Experience
      • LivingAnalytics, Inc.
      • Artsyl Technologies, Inc.
      • Visioneer, Inc.
      • ABBYY
      • IntelliKey Solutions, Inc.
      • Regis University
        • Deep Study in Genetic Algorithms and Real-Time Analytics
    • What qualifies Chris to talk to me?
      • Subject Matter Expert for: AIIM, TAWPI, DIR, Business Solution Mag.
      • Obtained Distinguished Services Award for Market Education
      • When a developer turns to sales and marketing
  • 4. What we will cover:
    • Why Chris?
    • What Are the Document Recognition Technologies
    • Who Makes Them
    • News Flash
    • The Future
    • Q & A
  • 5. The Technologies
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
  • 6. The Technologies: OCR
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    Ship To:
  • 7. The Technologies: ICR
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    Ilya
  • 8. The Technologies: OMR
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    Card Account
  • 9. The Technologies: IDR
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    Check Invoice Bill of Lading EOB
  • 10. The Technologies: Barcode
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    1889094476620
  • 11. The Technologies: Handwriting
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    * Critical *
  • 12. The Technologies: Acronym Heaven
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
  • 13. The Technologies: CAR/LAR
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    2 hundred dollars & no cents
  • 14. The Technologies: Assisted Capture
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
  • 15. The Technologies: Fixed Form Processing
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    Name: Ilya Date: 12/21/2982
  • 16. The Technologies: Fixed Form Processing Name: Ilya Date: 12/21/2982
  • 17. The Technologies: Semi-Structured Forms
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing – Complexity is Underestimated
    • Unstructured Document Processing
    Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04
  • 18. Invoice No: 99044 Date: 06/09/04 Invoice No: 24567 Date: 06/09/04 (06/09/2004) The Technologies: Semi-Structured Forms Note, many people confuse these documents as fixed
  • 19. The Technologies: Semi-Structured Forms
    • OCR – Optical Character Recognition
    • ICR – Intelligent Character Recognition
    • OMR – Optical Mark Recognition
    • IDR – Intelligent Document Recognition
    • Barcode
    • Handwriting
    • All the other ones made up for marketing purposes
    • CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition
    • Assisted Capture
    • Fixed Form Process
    • Semi-Structured Forms Processing
    • Unstructured Document Processing
    Consignee Consignor Date Term
  • 20. The Technologies: Common Processes
    • Full page conversion
    • Classification
    • Index level extraction
    • Redaction
    • Routing
    • Auto Filing
    • Re-Purposing
    • Image Rotation
  • 21. The Technologies: Full page conversion
    • Image file to electronic data file
    • ALL text on the page
    • Includes:
      • Image Pre-processing
      • Document Analysis/Zoning
      • Extraction
      • Export ( Commonly PDF, DOC )
  • 22. The Technologies: Classification
    • Software tells you the document type
    • Several Modes of document classification
      • Image Based
      • Contextual
    • Scan batches of mixed documents
    Bill of Lading Invoice Check PO
  • 23. The Technologies: Index Level Extraction
    • Just certain required fields extracted
    • Normalization of data
    • Export usually to a database
    Invoice Number Invoice Date Total Amt Due Term
  • 24. The Technologies: How Accurate
    • Better question is how do you determine accuracy
    • Document Type Accuracy
    • Field/Zone Location Accuracy
    • Data Type Accuracy
    • Character Accuracy
  • 25. The Technologies: Document Complexities
    • By Data Capture Complexity – Hardest to Easiest
    • EOB – Marginal success
      • BancTec
      • HOV
      • ECS
    • Student Transcriptions – no success, no money
    • Invoices ( not a vertical )
      • Telecom Bills
      • Legal Invoices
      • Aggregate Invoices
      • Traditional Invoice
    • Checks
    • Bill of Lading
    • Prescriptions
    • and Transportation Documents
    • HCFA UB
    • Fax Cover Sheets
    • Other typographic
    • Fixed Forms
  • 26. What we will cover:
    • Why Chris?
    • What Are the Document Recognition Technologies
    • Who Makes Them
    • News Flash
    • The Future
    • Q & A
  • 27. There Really are only 4 core technology providers It takes 50 man-years to develop OCR using traditional approach
  • 28. Who Makes Them: Core Engines
    • Traditional OCR Approach ( sorted by market share ) – All European Engines
    • Nuance ( formally ScanSoft ) derivative of Care Engine
      • Middle of the road cost, accuracy, speed
    • ABBYY
      • Most accurate, slowest, most expensive
    • Oc é
      • Very fast, moderately expensive
    • ReadI.R.I.S
      • Fastest, not very accurate
    • Specialized Engines
    • CharacTell
    • ParaScript
    • A2iA
    • Mitek
    • None Traditional Approach
    • NovoDynamics
    • TIS
    • Paledon
    • Other
    • Handful of Open Source, Tesseract, Octopus
    • Two handfuls of OLD engines, Expervision, Care
  • 29. Who Makes Them: History
    • Ray Kurzweil father of OCR – 1974
      • Arguably In University for some time
    • Caere Founded - 1976
    • Ray sells his Engine to Xerox PAC becomes TextBridge – 1978
    • ReadI.R.I.S formed by Belgium grant – 1981
    • Tesseract Created by HP research – 1985
    • Expervision Founded - 1987
    • UNLV becomes standards organization for OCR
    • ABBY Founded - 1989
      • Russian MIT Equivalent MIPT Moscow Institute of Physics and Technology
    • Luc Vincent Invents Document Analysis ( now at Google, lead on Tesseract project ) - 1994
    • ScanSoft Splits from Xerox 1998
    • ScanSoft acquires Caere - 2000
    • ScanSoft Becomes Nuance – 2005
      • OCR Business Takes Backseat
  • 30. Who Makes Them: Who Licenses Them
    • EVERYONE ELSE!
    • AnaComp
    • Anydoc
    • BancTec
    • BrainWare
    • Captaris
    • Captivation
    • Cardiff
    • CVision
    • DataCap
    • DigiTech
    • eCopy
    • EMC Documentum
    • Kofax
    • LaserFiche
    • LeadTools
    • Microsoft
    • NSi AutoStore
    • OnBase
    • Perceptive Imaging
    • ReadSoft
    • SER
    • Top Image Systems
    • Tower
    • Westbrook
    • Xerox
    • Etc.
  • 31. What we will cover:
    • Why Chris?
    • What Are the Document Recognition Technologies
    • Who Makes Them
    • News Flash
    • The Future
    • Q & A
  • 32. News Flash
    • Purchase Consolidation
      • OpenText bought Captaris who bought Oce
    • Legal
      • Nuance sues ABBYY over core OCR algorithms
        • If they win only one OCR engine only option is new approaches
  • 33. What we will cover:
    • Why Chris?
    • What Are the Document Recognition Technologies
    • Who Makes Them
    • Buyer Beware
    • News Flash
    • The Future
    • Q & A
  • 34. The Future
    • More lawsuits
    • More consolidation
    • Full-page OCR will be a commodity
    • Advance Document Processing will become main-stream but less required
    • Document classification will be next big area of research and product solutions, the new big ticket item
    • There will be a new approach to OCR
    • Think about what to do now that you will be gathering data rapidly
  • 35. What we will cover:
    • Why Chris?
    • What Are the Document Recognition Technologies
    • Who Makes Them
    • Buyer Beware
    • News Flash
    • The Future
    • Q & A
  • 36. Questions and Answers

×