Successfully reported this slideshow.

OCR and Content Management with SAP and Imaging

10,028 views

Published on

OCR and Content Management

Published in: Technology
  • Be the first to comment

OCR and Content Management with SAP and Imaging

  1. 1. ] ASUG 2008 Speaker Development John Walls, Senior Principal VerbellaCMG, LLC John.Walls@VerbellaCMG.com 484-888-2199[ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 [ GREG CAPPS ASUG INSTALLATION MEMBER MEMBER SINCE:1998 [ ISRAEL OLIVKOVICH SAP EMPLOYEE MEMBER SINCE: 2004
  2. 2. Learning ObjectivesAs a result of thisworkshop, you will beable to:  Clear understanding of what OCR is and what it is not, - what it is capable of and how it works  How it can be used to streamline current SAP or non-SAP process  How to make use of the data and documents that are captured  What is required for an OCR project2
  3. 3. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  4. 4. What are we trying to achieve?4
  5. 5. Invoice processing5
  6. 6. Traditional Indexing Method for A/P Solutions6
  7. 7. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  8. 8. OCR Definitions Simple  Optical character recognition- (OCR), is the electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text. Complex  OCR analyzes the shape of a bitmapped character and assigns a value to it based on a template system or mathematical feature analysis or feature extraction. This analysis produces a likely result along with a range of possible alternative characters. Each result is support by a likelihood percentage. * * Océ Document Technologies8
  9. 9. OCR Definitions Voting-  This is when 2 more OCR recognition engines are used and their results compared, voting on the most likely result. It’s designed to eliminate errors (false positives) and increase accuracy.  All OCR engines provide multiple results with a percentage of accuracy or likelihood.  When scanning in Forms with Handwriting – Voting is a very attractive scenario. *Océ Document Technologies9
  10. 10. Typical Set up – Process flow Scan Extract Validate Release Scan Documents Classification of Validate and Export data and Document Type correct data images to directory as Text orVRS 4.1 Pro for image XMLcleanup and improved Extraction of data Look up data from OCR accuracy fields from each databases and Images can be document other sources release as full search PDF’s or Tiff images 10
  11. 11. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  12. 12. Accounting DepartmentBusiness Process Challenges Document Management Challenges Time Consuming A/P Process Lost invoices High headcount and overtime High document Unused discounts transportation cost Incorrect data Misrouted documents Finance charges Lengthy invoice research Process status unknownKey Issues Reduce Processing Cost Speed of processing Accuracy of information Receiving invoices & preparation for processing 12
  13. 13. OCR- What’s in it for me? Better utilization of human capital – knowledge workers can focus on value-adding tasks. Increased Faster Reduced Reduced automation processing paper handling labor Say Good-Bye to Never Miss a Discount Gain An Auditable Retrieval Time Again Business Process13
  14. 14. Costly facts about Invoice processingIt takes an average of 12days to process an invoice1 out of 5 invoices have anomaliesIt takes >17 seconds to manually enteran invoice (excluding manual handling)96% invoice processing involveskeying data from paperA 1000 – 5000 employee organization handles on average24,500 invoices per month Direct labor cost alone for payables processing averages $3.31 per invoice Direct labor represents 30% of fully allocated payables costs* Average payables cost of $11.03 per invoice * Source Kofax, *Cass information Systems 14
  15. 15. Typical AP ProcessCost Center Receipt & Sorting with AP Sort invoices based on Send invoices to the Perform more sorting or Receive invoices at AP AP departmental AP Representative batching. Invoices may Vendor department structure (State, Cost responsible for be stamped with date Center, Division, etc.) processing invoices rec’d, annotated with GL Codes, etc.Cost Center Manual AP Process AP Rep keys invoices Correct entries as into the ERP system. Invoices maybe Parked needed or proceed Release batches for Line-item matching may Perform check run Pending Approval directly to releasing payment be performed on P.O.- batches for payment based invoices Paper Filing & Retrieval Send invoices to Send invoices to long- temporary on-site term off-site storage storage Pull invoices for audits and various other business requirements 15
  16. 16. Typical AP Process with OCRCost Center Automated Sorting Receipt & Sorting & Routing with AP AUTOMATED AUTOMATED AUTOMATED Sort invoices based on Send invoices to the Perform more sorting or Receive invoices at AP AP departmental AP Representative batching. Invoices may Vendor department structure (State, Cost responsible for be stamped with date Center, Division, etc.) processing invoices rec’d, annotated with GL Codes, etc.Cost Center Automated Process Manual AP AP Process AUTOMATED AP Rep keys invoices AUTOMATED AUTOMATED AUTOMATED AP Rep keys invoices Correct entries as into the ERP system. into the ERP system. Invoices maybe Parked needed or proceed Release batches for Line-item matching may Perform check run Line-item matching may Line-item matching Pending Approval directly to releasing payment be performed on P.O.- may be performed on be performed on P.O.- batches for payment based invoices P.O.-based invoices based invoices Paper Filing & Retrieval Automated Archive/Retrieval AUTOMATED AUTOMATED Send invoices to Send invoices to long- temporary on-site term off-site storage storage AUTOMATED Pull invoices for audits and various other business requirements 16
  17. 17. Information ExtractedStandard Header and Footer Data Standard Line Item Data  Purchase order number  PO Line Item Position  Invoice number and date  Quantity  Subtotal  Description  Taxes  Unit price  Freight  Total price  Discount  Grand total  Discount  Supplier details  Unit measure  Material NumberAny other data  Order number  Using customized extraction  Delivery note number schemes  Tax rate 17
  18. 18. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  19. 19. Information comes in many forms…  Structured Content  Information is predictable  Location of information is predictable Examples • Waybill • Delivery Documents • Tax Forms • Mail Order Forms • Applications • Insurance Claims Source: Kofax19
  20. 20. Information comes in many forms…  Semi-Structured Content  Information is predictable  Location of information is NOT predictable Examples • Accounts Payable • Accounts Receivable • Transportation • Bills of Lading • Medical Billing Source: Kofax20
  21. 21. Information comes in many forms…  Unstructured Content  Information is NOT predictable  Location of information is NOT predictable Examples • Mortgage Folders • Medical Records • Litigation Support Source: Kofax21
  22. 22. Types of Recognition  OCR- Optical Character Recognition  Used to read Machine print within images  OMR- Optical Mark Recognition  Used to identify checked boxes and other “selected options”  ICR- Intelligent Character Recognition  Used for identifying Handwriting or Hand print on a document. Could be used to pull information from “Forms”  IWR- Intelligent Word Recognition  Used to read Cursive writing for example “Checks” or “Prescriptions”22
  23. 23. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  24. 24. Rules-Based vs. Template-Based OCR Rules-based  Entire document is scanned and processed via Optical Character Recognition (OCR)  The OCR engine is used to search for a key (key words, phrases, or expression) and find the corresponding value (specific to general)  Once configured, it is most likely that new invoices can be read  OCR rates are .5 to 5 seconds Template-based (logo ID) — learn, memorize, teach  Each vendor invoice must be maintained as a template for each resolution (DPI)  New invoice might not be read — system learns the invoice  Database maintained — not good for large numbers of vendors  OCR rates are 8 - 12 seconds24
  25. 25. How Does Rules-Based OCR Work? Configuration  Create classification rules  Features and index fields that classify the document as:  PO, non-PO, credit memo, statement  Assign index fields based on classification  PO invoice classified — PO number, invoice date, invoice amount, and invoice number  Assign rules and logic to the key values (index fields)  PO Number, PO #, P.O. Num = (45########)  Logical expression 45[0-9]{8} validates 4512345678Dictionaries 25
  26. 26. Break Down- A Single Rule26
  27. 27. Overview of all the Rules27
  28. 28. Performance28
  29. 29. Template – Form IdentificationSample Pages Page-level form identification  A scanned or imported image is compared against the sample pages already "learned" by the form identification engine. Each comparison returns a confidence and a difference. Form identification zone  A form identification zone can be used to assist page-level form identification feature. It is typically used to help the form identification engine distinguish between forms that are very similar. 29
  30. 30. Template – RegistrationSample Pages Page-level registration-  Attempts to offset all zones based on how far large features on the page are offset from the same features on the sample page Registration Zones- “Text” and “Shape”  A text registration zone can be used to augment or replace page-level registration. Used if your images are different from the sample pages  Shape registration zone uses geometric patterns that are “fixed” in relation to the data on a form30
  31. 31. Extracting Index Values-Based Document Type (Forms) Index Values Here Form Identification Here31
  32. 32. Classification and Extraction of Index Fields  Documents are being classified as to the type of document  Once classified, the extraction of data begins Extraction32
  33. 33. Validation33
  34. 34. Release – Text or XML and Images Extracted index fields are released as an .xml or .txt file to a network share34
  35. 35. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  36. 36. OCR Processing Solution Steps Documen Invoice OCR Rules t and SAP Images Templates Database SAP Content Server Data Scan Extraction Validate Release SAP Workflow Continuous learning36
  37. 37. Classification Technologies Layout/Image Instruction Adaptive Feature Classification Classification Classification Image content  Keywords or phrases  Textual content  Patterns “Examples” Patterns “Examples”  Boolean (true or false)  Unstructured data Page layout  Requires OCR  Words, individual tokens X,Y coordinates  Logos or graphics  Mailroom application – Separates Small smeared thumbprint documents that a person would normally have to read No OCR Source: Kofax 37
  38. 38. Extraction Technology  Locators – extraction engines  Learn-by-example locators for rapid setup on key fields  Additional pre-built locators or Rules for other fields  Format, zones, tables, database, barcodes etc  Multi-language OCR/ICR  140+ languages  Chinese/Japanese/Korean  Multi-engine voting Source: Kofax38
  39. 39. Learn-By-Example  Classic learn-by-example approach where you start with a fixed model of the world as we see it and then present real examples to teach the system how things vary in real-life A an analogy would be teaching a child to read 1 2 3 The variation between each example, known as the semantics, is learnt by presenting lots of examples of real world invoices Source: Kofax39
  40. 40. Database Locator Matching of database fields to document data Fast, associative, fault tolerant search Works even with large databases >1 million records Returns record with best match 40
  41. 41. Format Locator Finds and reads data based on regular expressions and keywords, e.g.:  “d” = all single digits  “d{4-8}” = any number from 4 and 8 digits in length Multiple regular expressions can be defined to cover all alternatives, e.g. for multiple number formats Useful for  Invoice numbers  Dates  Account numbers Source: Kofax41
  42. 42. Direct SAP Data Validation Finds and reads data using a customized VB compatible script such as VB .NET 2.0 Call Remote enabled RFC’s For Example- To check for the existence of a PO or Validate Vendor information, look up vendor number Source: Kofax42
  43. 43. Validation43
  44. 44. Virtual Re-Scan (VRS) Eliminates RescanningLow Contrast LogoDot Matrix TextHighlighter Carbon Copy HandprintCoffee Cup Stain Shaded background Source: Kofax44
  45. 45. VirtualReScan™ (VRS) Scanned in colorImage File Size = 213 KB Scanned in 1 bit B/W Image further processed with VRS 45
  46. 46. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  47. 47. What happens after Release?  Release into Pre-defined A/P solutions  IXOS Vendor Invoice Management  Norikkon APay Center  170 MarkView Financial Suite  Ebydos  Bassware  Brainware  SAPERION InBound Center  Custom Ledger solution  Custom Programming-  Park Invoices and route workitems for further processing  Post Invoices and route exceptions47
  48. 48. Content Management - Direct ReleaseRelease ARCHIVE a.k.a. Content Server Jukebox CAS Storage Centera48
  49. 49. Standard Release – Using XML or TXT files and ABAP XML ABAP’s Network Directory s’Release AP ARCHIVE a.k.a. AB Content Server Jukebox CAS Storage Centera49
  50. 50. Integrated Release – RFC’s and Function Modules 1 RFC calls FM to Create URL 2 Document is stored 3 Extracted Data is passed back into specific FM’s to create Function workitems or post documents RFC Modules EMC or Net Apps H TTRelease HTTPS PS Jukebox ARCHIVE a.k.a. Content Server50
  51. 51. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  52. 52. Template Case Study – Communications Industry  Situation:  Customer receives 10,000 returned billing statements from the USPS per month. 1.5 FTE were 4 months behind in processing the return mail and updating the system that the billing address was wrong. So statements were continually going out as undeliverable.  Pain:  Postage cost alone were running at $4,000 a month  Labor cost to process the return mail 1.5 FTE  Missed opportunities for Customer Service to obtain current address information from the customer when they had them on the phone Solution: Template OCR  Scanned in the first page of each statement and read the customer account number and then released this information in a text file. A process then reads this file and updated the billing system (Non-SAP)  Return Mail is processed daily within 15 minutes on average52
  53. 53. Overview of the Validation Station53
  54. 54. Rules Based Case Study – Chemical Industry  Situation:  2 FTE are manually entering invoice header “indexing” information and invoices are manually posted 27 FTE’s, average invoice per FTE / 923 invoices/month  Pain:  Wanted to reduce the number of days down from 6 days, reduce resources, and reduce the potential for Ergonomic injuries  Manual data entry, missed discounts  Goal:  Increase discounts taken, increase productivity, reduce by 3FTE’s, create ability to further automate workflow exception handling using OCR Solution: Rules Based OCR  Vendor invoices are scanned and run through the OCR software, this information is then used to automatically post the top 20 vendors, and route all other invoices to AP processors.54
  55. 55. Rules Based Case Study – Chemical Industry  Results  Reduced by 3 FTE’s the month of implementation – from 27 FTE’s to 24 FTE’s  1 FTE in document control  2 FTE’s in invoice posting  Improved ability to take term discounts  % taken YTD:  Jan 38%, Feb. 45%, March 35%, April 40%, May 53%, June 46%, July 55%  Invoices processed per FTE reached 1550 in July, from 923 pre-implementation  Reduced data entry by exploiting OCR and creating programming to post invoices to purchase orders automatically  Targeted high volume, suppliers for auto-post, then worked with suppliers to ensure invoice criteria was met, then templates were created.  7% of invoices are auto-posted, without human intervention. Approximately 1600 invoices a month. Invoices with errors are routed to processors.  Post implementation - additional suppliers have been identified for auto-posting and will be added55
  56. 56. OCR Benefits in Accounts Payable Processing Costs  Reduced Labor  Invoice Processing  Invoice Sorting  Filing  Reduced Paper Handling  No lost invoices  Can be accessed by multiple users and locations simultaneously  Storage space for of physical documents reduced  Increase Processing Speed  Invoices can be scanned and processed on the day that they are received increasing visibility for management.  Early payment discounts can be utilized  Increased data accuracy  Fewer data entry mistakes are made  Increased Accessibility/Availability Productivity and Efficiencies  Access to documents is Instant  Information sharing is enhanced56
  57. 57. ][ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  58. 58. Most OCR solutions include Administration Client- Design or Project Builder client Setup Batch structure and Setup project structure process Setup classification and Setup Release details extraction schemes Document Separations Train the project Scanner Configuration Define validation rules Users Authorizations Design Validation screen User – Solution layout Developer/Administrator Test the project User – Solution Developer58
  59. 59. Leading Practice  Start Slowly- Minimum disruption to existing process.  Some companies started with 1 vendor a day  Some companies start with the top 20% of vendors  You still need to pay your bills  New technology- employee need time to adjust and embrace. Using the light switch approach may turn employees off.  Start Imaging before OCR and Process Automation  Constant Improvement the process will constantly need to be monitored, Always tweaking59
  60. 60. Leading Practice  Notify your vendors-  Tell them that there documents will be processed via OCR  Get them to work with you.  Single line - line items  Stop sending invoices on blue paper with balloons  Clearly identify the information that you need to see  Consolidate Vendors  Use this as a opportunity to consolidate vendors, if they are not going to help with the above, then consolidate  Clean up Vendor Master records  Vendor Address and Phone numbers will be used daily for vendor look up, make sure the information is correct.  Set up process to correct errors efficiently60
  61. 61. What do I need to get started?  Scanner for Document imaging  Scanner that is supports VRS (Virtual ReScan)  OCR Software solution  Rules based OCR solution  Content Management solution  SAP Content Server  PBS ContentLink with EMC Centra, NetApp Filer, and DR  OpenText (IXOS), OnBase, IBM, Documentum, or FileNet etc  Release to SAP system  Automatic Posting or further processing within SAP Workflow61
  62. 62. Questions?62
  63. 63. ]  Thank you for participating. Please remember to complete and return your evaluation form following this session. For ongoing education on this area of focus, visit the Year-Round Community page at www.asug.com/yrc [ SESSION CODE: 0403John WallsVerbella CMG, LLCJohn.walls@verbellacmg.com484-888-2199www.verbellacmg.com 63

×