Successfully reported this slideshow.
Your SlideShare is downloading. ×

OCR and Content Management with SAP and Imaging

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 63 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (17)

Advertisement

Similar to OCR and Content Management with SAP and Imaging (20)

More from Verbella CMG (11)

Advertisement

Recently uploaded (20)

OCR and Content Management with SAP and Imaging

  1. 1. ] ASUG 2008 Speaker Development John Walls, Senior Principal VerbellaCMG, LLC John.Walls@VerbellaCMG.com 484-888-2199 [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 [ GREG CAPPS ASUG INSTALLATION MEMBER MEMBER SINCE:1998 [ ISRAEL OLIVKOVICH SAP EMPLOYEE MEMBER SINCE: 2004
  2. 2. Learning Objectives As a result of this workshop, you will be able to:  Clear understanding of what OCR is and what it is not, - what it is capable of and how it works  How it can be used to streamline current SAP or non-SAP process  How to make use of the data and documents that are captured  What is required for an OCR project 2
  3. 3. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  4. 4. What are we trying to achieve? 4
  5. 5. Invoice processing 5
  6. 6. Traditional Indexing Method for A/P Solutions 6
  7. 7. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  8. 8. OCR Definitions Simple  Optical character recognition- (OCR), is the electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text. Complex  OCR analyzes the shape of a bitmapped character and assigns a value to it based on a template system or mathematical feature analysis or feature extraction. This analysis produces a likely result along with a range of possible alternative characters. Each result is support by a likelihood percentage. * * Océ Document Technologies 8
  9. 9. OCR Definitions Voting-  This is when 2 more OCR recognition engines are used and their results compared, voting on the most likely result. It’s designed to eliminate errors (false positives) and increase accuracy.  All OCR engines provide multiple results with a percentage of accuracy or likelihood.  When scanning in Forms with Handwriting – Voting is a very attractive scenario. *Océ Document Technologies 9
  10. 10. Typical Set up – Process flow Scan Extract Validate Release Scan Documents Classification of Validate and Export data and Document Type correct data images to directory as Text or VRS 4.1 Pro for image XML cleanup and improved Extraction of data Look up data from OCR accuracy fields from each databases and Images can be document other sources release as full search PDF’s or Tiff images 10
  11. 11. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  12. 12. Accounting Department Business Process Challenges Document Management Challenges Time Consuming A/P Process Lost invoices High headcount and overtime High document Unused discounts transportation cost Incorrect data Misrouted documents Finance charges Lengthy invoice research Process status unknown Key Issues Reduce Processing Cost Speed of processing Accuracy of information Receiving invoices & preparation for processing 12
  13. 13. OCR- What’s in it for me? Better utilization of human capital – knowledge workers can focus on value-adding tasks. Increased Faster Reduced Reduced automation processing paper handling labor Say Good-Bye to Never Miss a Discount Gain An Auditable Retrieval Time Again Business Process 13
  14. 14. Costly facts about Invoice processing It takes an average of 12 days to process an invoice 1 out of 5 invoices have anomalies It takes >17 seconds to manually enter an invoice (excluding manual handling) 96% invoice processing involves keying data from paper A 1000 – 5000 employee organization handles on average 24,500 invoices per month Direct labor cost alone for payables processing averages $3.31 per invoice Direct labor represents 30% of fully allocated payables costs* Average payables cost of $11.03 per invoice * Source Kofax, *Cass information Systems 14
  15. 15. Typical AP Process Cost Center Receipt & Sorting with AP Sort invoices based on Send invoices to the Perform more sorting or Receive invoices at AP AP departmental AP Representative batching. Invoices may Vendor department structure (State, Cost responsible for be stamped with date Center, Division, etc.) processing invoices rec’d, annotated with GL Codes, etc. Cost Center Manual AP Process AP Rep keys invoices Correct entries as into the ERP system. Invoices maybe Parked needed or proceed Release batches for Line-item matching may Perform check run Pending Approval directly to releasing payment be performed on P.O.- batches for payment based invoices Paper Filing & Retrieval Send invoices to Send invoices to long- temporary on-site term off-site storage storage Pull invoices for audits and various other business requirements 15
  16. 16. Typical AP Process with OCR Cost Center Automated Sorting Receipt & Sorting & Routing with AP AUTOMATED AUTOMATED AUTOMATED Sort invoices based on Send invoices to the Perform more sorting or Receive invoices at AP AP departmental AP Representative batching. Invoices may Vendor department structure (State, Cost responsible for be stamped with date Center, Division, etc.) processing invoices rec’d, annotated with GL Codes, etc. Cost Center Automated Process Manual AP AP Process AUTOMATED AP Rep keys invoices AUTOMATED AUTOMATED AUTOMATED AP Rep keys invoices Correct entries as into the ERP system. into the ERP system. Invoices maybe Parked needed or proceed Release batches for Line-item matching may Perform check run Line-item matching may Line-item matching Pending Approval directly to releasing payment be performed on P.O.- may be performed on be performed on P.O.- batches for payment based invoices P.O.-based invoices based invoices Paper Filing & Retrieval Automated Archive/Retrieval AUTOMATED AUTOMATED Send invoices to Send invoices to long- temporary on-site term off-site storage storage AUTOMATED Pull invoices for audits and various other business requirements 16
  17. 17. Information Extracted Standard Header and Footer Data Standard Line Item Data  Purchase order number  PO Line Item Position  Invoice number and date  Quantity  Subtotal  Description  Taxes  Unit price  Freight  Total price  Discount  Grand total  Discount  Supplier details  Unit measure  Material Number Any other data  Order number  Using customized extraction  Delivery note number schemes  Tax rate 17
  18. 18. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  19. 19. Information comes in many forms…  Structured Content  Information is predictable  Location of information is predictable Examples • Waybill • Delivery Documents • Tax Forms • Mail Order Forms • Applications • Insurance Claims Source: Kofax 19
  20. 20. Information comes in many forms…  Semi-Structured Content  Information is predictable  Location of information is NOT predictable Examples • Accounts Payable • Accounts Receivable • Transportation • Bills of Lading • Medical Billing Source: Kofax 20
  21. 21. Information comes in many forms…  Unstructured Content  Information is NOT predictable  Location of information is NOT predictable Examples • Mortgage Folders • Medical Records • Litigation Support Source: Kofax 21
  22. 22. Types of Recognition  OCR- Optical Character Recognition  Used to read Machine print within images  OMR- Optical Mark Recognition  Used to identify checked boxes and other “selected options”  ICR- Intelligent Character Recognition  Used for identifying Handwriting or Hand print on a document. Could be used to pull information from “Forms”  IWR- Intelligent Word Recognition  Used to read Cursive writing for example “Checks” or “Prescriptions” 22
  23. 23. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  24. 24. Rules-Based vs. Template-Based OCR  Rules-based  Entire document is scanned and processed via Optical Character Recognition (OCR)  The OCR engine is used to search for a key (key words, phrases, or expression) and find the corresponding value (specific to general)  Once configured, it is most likely that new invoices can be read  OCR rates are .5 to 5 seconds  Template-based (logo ID) — learn, memorize, teach  Each vendor invoice must be maintained as a template for each resolution (DPI)  New invoice might not be read — system learns the invoice  Database maintained — not good for large numbers of vendors  OCR rates are 8 - 12 seconds 24
  25. 25. How Does Rules-Based OCR Work?  Configuration  Create classification rules  Features and index fields that classify the document as:  PO, non-PO, credit memo, statement  Assign index fields based on classification  PO invoice classified — PO number, invoice date, invoice amount, and invoice number  Assign rules and logic to the key values (index fields)  PO Number, PO #, P.O. Num = (45########)  Logical expression 45[0-9]{8} validates 4512345678 Dictionaries 25
  26. 26. Break Down- A Single Rule 26
  27. 27. Overview of all the Rules 27
  28. 28. Performance 28
  29. 29. Template – Form Identification Sample Pages  Page-level form identification  A scanned or imported image is compared against the sample pages already "learned" by the form identification engine. Each comparison returns a confidence and a difference. Form identification zone  A form identification zone can be used to assist page-level form identification feature. It is typically used to help the form identification engine distinguish between forms that are very similar. 29
  30. 30. Template – Registration Sample Pages  Page-level registration-  Attempts to offset all zones based on how far large features on the page are offset from the same features on the sample page  Registration Zones- “Text” and “Shape”  A text registration zone can be used to augment or replace page-level registration. Used if your images are different from the sample pages  Shape registration zone uses geometric patterns that are “fixed” in relation to the data on a form 30
  31. 31. Extracting Index Values-Based Document Type (Forms) Index Values Here Form Identification Here 31
  32. 32. Classification and Extraction of Index Fields  Documents are being classified as to the type of document  Once classified, the extraction of data begins Extraction 32
  33. 33. Validation 33
  34. 34. Release – Text or XML and Images  Extracted index fields are released as an .xml or .txt file to a network share 34
  35. 35. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  36. 36. OCR Processing Solution Steps Documen Invoice OCR Rules t and SAP Images Templates Database SAP Content Server Data Scan Extraction Validate Release SAP Workflow Continuous learning 36
  37. 37. Classification Technologies Layout/Image Instruction Adaptive Feature Classification Classification Classification  Image content  Keywords or phrases  Textual content  Patterns “Examples”  Patterns “Examples”  Boolean (true or false)  Unstructured data  Page layout  Requires OCR  Words, individual tokens  X,Y coordinates  Logos or graphics  Mailroom application – Separates  Small smeared thumbprint documents that a person would normally have to read  No OCR Source: Kofax 37
  38. 38. Extraction Technology  Locators – extraction engines  Learn-by-example locators for rapid setup on key fields  Additional pre-built locators or Rules for other fields  Format, zones, tables, database, barcodes etc  Multi-language OCR/ICR  140+ languages  Chinese/Japanese/Korean  Multi-engine voting Source: Kofax 38
  39. 39. Learn-By-Example  Classic learn-by-example approach where you start with a fixed model of the world as we see it and then present real examples to teach the system how things vary in real-life A an analogy would be teaching a child to read 1 2 3  The variation between each example, known as the semantics, is learnt by presenting lots of examples of real world invoices Source: Kofax 39
  40. 40. Database Locator  Matching of database fields to document data  Fast, associative, fault tolerant search  Works even with large databases >1 million records  Returns record with best match 40
  41. 41. Format Locator  Finds and reads data based on regular expressions and keywords, e.g.:  “d” = all single digits  “d{4-8}” = any number from 4 and 8 digits in length  Multiple regular expressions can be defined to cover all alternatives, e.g. for multiple number formats  Useful for  Invoice numbers  Dates  Account numbers Source: Kofax 41
  42. 42. Direct SAP Data Validation Finds and reads data using a customized VB compatible script such as VB .NET 2.0 Call Remote enabled RFC’s For Example- To check for the existence of a PO or Validate Vendor information, look up vendor number Source: Kofax 42
  43. 43. Validation 43
  44. 44. Virtual Re-Scan (VRS) Eliminates Rescanning Low Contrast Logo Dot Matrix Text Highlighter Carbon Copy Handprint Coffee Cup Stain Shaded background Source: Kofax 44
  45. 45. VirtualReScan™ (VRS) Scanned in color Image File Size = 213 KB Scanned in 1 bit B/W Image further processed with VRS 45
  46. 46. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  47. 47. What happens after Release?  Release into Pre-defined A/P solutions  IXOS Vendor Invoice Management  Norikkon APay Center  170 MarkView Financial Suite  Ebydos  Bassware  Brainware  SAPERION InBound Center  Custom Ledger solution  Custom Programming-  Park Invoices and route workitems for further processing  Post Invoices and route exceptions 47
  48. 48. Content Management - Direct Release Release ARCHIVE a.k.a. Content Server Jukebox CAS Storage Centera 48
  49. 49. Standard Release – Using XML or TXT files and ABAP XML ABAP’s Network Directory s’ Release AP ARCHIVE a.k.a. AB Content Server Jukebox CAS Storage Centera 49
  50. 50. Integrated Release – RFC’s and Function Modules 1 RFC calls FM to Create URL 2 Document is stored 3 Extracted Data is passed back into specific FM’s to create Function workitems or post documents RFC Modules EMC or Net Apps H TT Release HTTPS PS Jukebox ARCHIVE a.k.a. Content Server 50
  51. 51. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  52. 52. Template Case Study – Communications Industry  Situation:  Customer receives 10,000 returned billing statements from the USPS per month. 1.5 FTE were 4 months behind in processing the return mail and updating the system that the billing address was wrong. So statements were continually going out as undeliverable.  Pain:  Postage cost alone were running at $4,000 a month  Labor cost to process the return mail 1.5 FTE  Missed opportunities for Customer Service to obtain current address information from the customer when they had them on the phone Solution: Template OCR  Scanned in the first page of each statement and read the customer account number and then released this information in a text file. A process then reads this file and updated the billing system (Non-SAP)  Return Mail is processed daily within 15 minutes on average 52
  53. 53. Overview of the Validation Station 53
  54. 54. Rules Based Case Study – Chemical Industry  Situation:  2 FTE are manually entering invoice header “indexing” information and invoices are manually posted 27 FTE’s, average invoice per FTE / 923 invoices/month  Pain:  Wanted to reduce the number of days down from 6 days, reduce resources, and reduce the potential for Ergonomic injuries  Manual data entry, missed discounts  Goal:  Increase discounts taken, increase productivity, reduce by 3FTE’s, create ability to further automate workflow exception handling using OCR Solution: Rules Based OCR  Vendor invoices are scanned and run through the OCR software, this information is then used to automatically post the top 20 vendors, and route all other invoices to AP processors. 54
  55. 55. Rules Based Case Study – Chemical Industry  Results  Reduced by 3 FTE’s the month of implementation – from 27 FTE’s to 24 FTE’s  1 FTE in document control  2 FTE’s in invoice posting  Improved ability to take term discounts  % taken YTD:  Jan 38%, Feb. 45%, March 35%, April 40%, May 53%, June 46%, July 55%  Invoices processed per FTE reached 1550 in July, from 923 pre-implementation  Reduced data entry by exploiting OCR and creating programming to post invoices to purchase orders automatically  Targeted high volume, suppliers for auto-post, then worked with suppliers to ensure invoice criteria was met, then templates were created.  7% of invoices are auto-posted, without human intervention. Approximately 1600 invoices a month. Invoices with errors are routed to processors.  Post implementation - additional suppliers have been identified for auto-posting and will be added 55
  56. 56. OCR Benefits in Accounts Payable Processing Costs  Reduced Labor  Invoice Processing  Invoice Sorting  Filing  Reduced Paper Handling  No lost invoices  Can be accessed by multiple users and locations simultaneously  Storage space for of physical documents reduced  Increase Processing Speed  Invoices can be scanned and processed on the day that they are received increasing visibility for management.  Early payment discounts can be utilized  Increased data accuracy  Fewer data entry mistakes are made  Increased Accessibility/Availability Productivity and Efficiencies  Access to documents is Instant  Information sharing is enhanced 56
  57. 57. ] [ GRETCHEN LINDQUIST ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 As IS To Be- Processing What is OCR Why should we use it Document Structure, Types of Recognition Template vs. Rules Based OCR Flow, Classification, Extraction, and VRS Technology [ GREG CAPPS Integration and release ASUG INSTALLATION MEMBER MEMBER SINCE:1998 Case Study [ISRAEL OLIVKOVICH SAP EMPLOYEE Wrap up and Questions MEMBER SINCE: 2004
  58. 58. Most OCR solutions include Administration Client- Design or Project Builder client Setup Batch structure and Setup project structure process Setup classification and Setup Release details extraction schemes Document Separations Train the project Scanner Configuration Define validation rules Users Authorizations Design Validation screen User – Solution layout Developer/Administrator Test the project User – Solution Developer 58
  59. 59. Leading Practice  Start Slowly- Minimum disruption to existing process.  Some companies started with 1 vendor a day  Some companies start with the top 20% of vendors  You still need to pay your bills  New technology- employee need time to adjust and embrace. Using the light switch approach may turn employees off.  Start Imaging before OCR and Process Automation  Constant Improvement the process will constantly need to be monitored, Always tweaking 59
  60. 60. Leading Practice  Notify your vendors-  Tell them that there documents will be processed via OCR  Get them to work with you.  Single line - line items  Stop sending invoices on blue paper with balloons  Clearly identify the information that you need to see  Consolidate Vendors  Use this as a opportunity to consolidate vendors, if they are not going to help with the above, then consolidate  Clean up Vendor Master records  Vendor Address and Phone numbers will be used daily for vendor look up, make sure the information is correct.  Set up process to correct errors efficiently 60
  61. 61. What do I need to get started?  Scanner for Document imaging  Scanner that is supports VRS (Virtual ReScan)  OCR Software solution  Rules based OCR solution  Content Management solution  SAP Content Server  PBS ContentLink with EMC Centra, NetApp Filer, and DR  OpenText (IXOS), OnBase, IBM, Documentum, or FileNet etc  Release to SAP system  Automatic Posting or further processing within SAP Workflow 61
  62. 62. Questions? 62
  63. 63. ]  Thank you for participating. Please remember to complete and return your evaluation form following this session. For ongoing education on this area of focus, visit the Year-Round Community page at www.asug.com/yrc [ SESSION CODE: 0403 John Walls Verbella CMG, LLC John.walls@verbellacmg.com 484-888-2199 www.verbellacmg.com 63

Editor's Notes

  • The system also includes a tightly integrated connection to your ERP and host systems. This lets you apply business rules and complex validation routines, and make your captured data available immediately.
  • For each group data is extracted using learn-by-example technique Classic learn-by-example approach where you start with a fixed model of the world as we see it and then present real examples to teach the system how things vary in real-life So, in this case we use a model of the syntax of invoice data within a field group Syntax applies to all invoices we will see in the real world, e.g. 3 examples The variation between each example, known as the semantics, is learnt by presenting lots of examples of real world invoices If you present enough you’ll cover all possible variations Once semantics are learnt they’re held in a configuration file called a knowledgebase
  • Document imaging technology can vastly improve operations in accounts payable departments. The benefits are substantial.

×