The document discusses optical character recognition (OCR) technology and its use in invoice processing workflows. It provides an overview of rules-based OCR, describing how rules are configured to extract key fields from invoices. It also compares rules-based OCR to template-based OCR, noting that rules-based OCR can process new invoices without pre-configured templates but may be slower. The document outlines typical steps in an OCR invoice processing solution, including scanning, extraction, validation, and releasing data to SAP systems.
08448380779 Call Girls In Civil Lines Women Seeking Men
OCR for Invoice Processing
1. ]
ASUG 2008 Speaker Development
John Walls, Senior Principal
VerbellaCMG, LLC
John.Walls@VerbellaCMG.com
484-888-2199
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999
[ GREG CAPPS
ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
[ ISRAEL OLIVKOVICH
SAP EMPLOYEE
MEMBER SINCE: 2004
2. Learning Objectives
As a result of this
workshop, you will be
able to:
Clear understanding of what
OCR is and what it is not, - what
it is capable of and how it works
How it can be used to streamline
current SAP or non-SAP process
How to make use of the data and
documents that are captured
What is required for an OCR
project
2
3. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
7. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
8. OCR Definitions
Simple
Optical character recognition- (OCR), is the
electronic translation of images of handwritten or
typewritten text (usually captured by a scanner) into
machine-editable text.
Complex
OCR analyzes the shape of a bitmapped character and
assigns a value to it based on a template system or
mathematical feature analysis or feature extraction.
This analysis produces a likely result along with a
range of possible alternative characters. Each result is
support by a likelihood percentage. *
* Océ Document Technologies
8
9. OCR Definitions
Voting-
This is when 2 more OCR recognition engines are
used and their results compared, voting on the most
likely result. It’s designed to eliminate errors (false
positives) and increase accuracy.
All OCR engines provide multiple results with a
percentage of accuracy or likelihood.
When scanning in Forms with Handwriting – Voting is
a very attractive scenario.
*Océ Document Technologies
9
10. Typical Set up – Process flow
Scan Extract Validate Release
Scan Documents Classification of Validate and Export data and
Document Type correct data images to
directory as Text or
VRS 4.1 Pro for image XML
cleanup and improved Extraction of data Look up data from
OCR accuracy fields from each databases and
Images can be
document other sources
release as full
search PDF’s or
Tiff images
10
11. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
12. Accounting Department
Business Process Challenges Document Management Challenges
Time Consuming A/P Process Lost invoices
High headcount and overtime High document
Unused discounts transportation cost
Incorrect data Misrouted documents
Finance charges Lengthy invoice research
Process status unknown
Key Issues
Reduce Processing Cost
Speed of processing
Accuracy of information
Receiving invoices &
preparation for processing
12
13. OCR- What’s in it for me?
Better utilization of human capital –
knowledge workers can focus on value-adding tasks.
Increased Faster Reduced Reduced
automation processing paper handling labor
Say Good-Bye to Never Miss a Discount Gain An Auditable
Retrieval Time Again Business Process
13
14. Costly facts about Invoice processing
It takes an average of 12
days to process an invoice
1 out of 5 invoices have anomalies
It takes >17 seconds to manually enter
an invoice (excluding manual handling)
96% invoice processing involves
keying data from paper
A 1000 – 5000 employee organization handles on average
24,500 invoices per month
Direct labor cost alone for payables processing averages $3.31 per
invoice
Direct labor represents 30% of fully allocated payables costs*
Average payables cost of $11.03 per invoice *
Source Kofax, *Cass information Systems
14
15. Typical AP Process
Cost Center Receipt & Sorting
with AP
Sort invoices based on Send invoices to the Perform more sorting or
Receive invoices at AP AP departmental AP Representative batching. Invoices may
Vendor
department structure (State, Cost responsible for be stamped with date
Center, Division, etc.) processing invoices rec’d, annotated with GL
Codes, etc.
Cost Center
Manual AP Process
AP Rep keys invoices
Correct entries as
into the ERP system.
Invoices maybe Parked needed or proceed Release batches for
Line-item matching may Perform check run
Pending Approval directly to releasing payment
be performed on P.O.-
batches for payment
based invoices
Paper Filing & Retrieval
Send invoices to
Send invoices to long-
temporary on-site
term off-site storage
storage
Pull invoices for audits
and various other
business requirements
15
16. Typical AP Process with OCR
Cost Center Automated Sorting
Receipt & Sorting & Routing
with AP
AUTOMATED AUTOMATED AUTOMATED
Sort invoices based on Send invoices to the Perform more sorting or
Receive invoices at AP AP departmental AP Representative batching. Invoices may
Vendor
department structure (State, Cost responsible for be stamped with date
Center, Division, etc.) processing invoices rec’d, annotated with GL
Codes, etc.
Cost Center
Automated Process
Manual AP AP Process
AUTOMATED
AP Rep keys invoices AUTOMATED AUTOMATED AUTOMATED
AP Rep keys invoices Correct entries as
into the ERP system.
into the ERP system. Invoices maybe Parked needed or proceed Release batches for
Line-item matching may Perform check run
Line-item matching may
Line-item matching Pending Approval directly to releasing payment
be performed on P.O.-
may be performed on
be performed on P.O.- batches for payment
based invoices
P.O.-based invoices
based invoices
Paper Filing & Retrieval
Automated Archive/Retrieval
AUTOMATED AUTOMATED
Send invoices to
Send invoices to long-
temporary on-site
term off-site storage
storage
AUTOMATED
Pull invoices for audits
and various other
business requirements
16
17. Information Extracted
Standard Header and Footer Data Standard Line Item Data
Purchase order number PO Line Item Position
Invoice number and date Quantity
Subtotal Description
Taxes
Unit price
Freight
Total price
Discount
Grand total
Discount
Supplier details Unit measure
Material Number
Any other data Order number
Using customized extraction Delivery note number
schemes Tax rate
17
18. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
19. Information comes in many forms…
Structured Content
Information is predictable
Location of information is
predictable
Examples
• Waybill
• Delivery Documents
• Tax Forms
• Mail Order Forms
• Applications
• Insurance Claims
Source: Kofax
19
20. Information comes in many forms…
Semi-Structured Content
Information is predictable
Location of information is
NOT predictable
Examples
• Accounts Payable
• Accounts Receivable
• Transportation
• Bills of Lading
• Medical Billing
Source: Kofax
20
21. Information comes in many forms…
Unstructured Content
Information is NOT predictable
Location of information is
NOT predictable
Examples
• Mortgage Folders
• Medical Records
• Litigation Support
Source: Kofax
21
22. Types of Recognition
OCR- Optical Character Recognition
Used to read Machine print within images
OMR- Optical Mark Recognition
Used to identify checked boxes and other “selected options”
ICR- Intelligent Character Recognition
Used for identifying Handwriting or Hand print on a
document. Could be used to pull information from “Forms”
IWR- Intelligent Word Recognition
Used to read Cursive writing for example “Checks” or
“Prescriptions”
22
23. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
24. Rules-Based vs. Template-Based OCR
Rules-based
Entire document is scanned and processed via Optical Character
Recognition (OCR)
The OCR engine is used to search for a key (key words, phrases,
or expression) and find the corresponding value (specific to
general)
Once configured, it is most likely that new invoices can be read
OCR rates are .5 to 5 seconds
Template-based (logo ID) — learn, memorize, teach
Each vendor invoice must be maintained as a template for each
resolution (DPI)
New invoice might not be read — system learns the invoice
Database maintained — not good for large numbers of vendors
OCR rates are 8 - 12 seconds
24
25. How Does Rules-Based OCR Work?
Configuration
Create classification rules
Features and index fields that classify the document as:
PO, non-PO, credit memo, statement
Assign index fields based on classification
PO invoice classified — PO number, invoice date, invoice amount,
and invoice number
Assign rules and logic to the key values (index fields)
PO Number, PO #, P.O. Num = (45########)
Logical expression 45[0-9]{8} validates 4512345678
Dictionaries
25
29. Template – Form Identification
Sample Pages
Page-level form identification
A scanned or imported image is compared against the sample pages
already "learned" by the form identification engine. Each comparison
returns a confidence and a difference.
Form identification zone
A form identification zone can be used to assist page-level form
identification feature. It is typically used to help the form
identification engine distinguish between forms that are very similar.
29
30. Template – Registration
Sample Pages
Page-level registration-
Attempts to offset all zones based on how far large features on
the page are offset from the same features on the sample page
Registration Zones- “Text” and “Shape”
A text registration zone can be used to augment or replace
page-level registration. Used if your images are different from
the sample pages
Shape registration zone uses
geometric patterns that are “fixed”
in relation to the data on a form
30
32. Classification and Extraction of Index Fields
Documents are being classified as to the type
of document
Once classified, the extraction of data begins
Extraction
32
34. Release – Text or XML and Images
Extracted index fields are released as an .xml or .txt file to a
network share
34
35. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
36. OCR Processing Solution Steps
Documen
Invoice OCR Rules t
and SAP
Images
Templates Database
SAP Content
Server
Data
Scan Extraction Validate Release
SAP
Workflow
Continuous
learning
36
37. Classification Technologies
Layout/Image Instruction Adaptive Feature
Classification Classification Classification
Image content Keywords or phrases Textual content
Patterns “Examples”
Patterns “Examples” Boolean (true or false)
Unstructured data
Page layout Requires OCR Words, individual tokens
X,Y coordinates Logos or graphics Mailroom application – Separates
Small smeared thumbprint documents that a person would normally
have to read
No OCR
Source: Kofax
37
38. Extraction Technology
Locators – extraction engines
Learn-by-example locators for rapid setup on key fields
Additional pre-built locators or Rules for other fields
Format, zones, tables, database, barcodes etc
Multi-language OCR/ICR
140+ languages
Chinese/Japanese/Korean
Multi-engine voting
Source: Kofax
38
39. Learn-By-Example
Classic learn-by-example approach where you start with a fixed model
of the world as we see it and then present real examples to teach the
system how things vary in real-life
A an analogy would be teaching a child to read
1
2
3
The variation between each example, known as the semantics, is learnt
by presenting lots of examples of real world invoices
Source: Kofax
39
40. Database Locator
Matching of database fields
to document data
Fast, associative, fault
tolerant search
Works even with large
databases >1 million records
Returns record with best
match
40
41. Format Locator
Finds and reads data based on regular expressions
and keywords, e.g.:
“d” = all single digits
“d{4-8}” = any number from 4 and 8 digits in
length
Multiple regular expressions can be defined to cover
all alternatives, e.g. for multiple number formats
Useful for
Invoice numbers
Dates
Account numbers
Source: Kofax
41
42. Direct SAP Data Validation
Finds and reads data using a
customized VB compatible
script such as VB .NET 2.0
Call Remote enabled RFC’s
For Example-
To check for the
existence of a PO or
Validate Vendor
information, look up
vendor number
Source: Kofax
42
44. Virtual Re-Scan (VRS) Eliminates Rescanning
Low Contrast
Logo
Dot Matrix
Text
Highlighter
Carbon Copy
Handprint
Coffee Cup
Stain
Shaded
background
Source: Kofax
44
45. VirtualReScan™ (VRS)
Scanned in color
Image File Size = 213 KB
Scanned in 1 bit B/W
Image further
processed with VRS
45
46. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
47. What happens after Release?
Release into Pre-defined A/P solutions
IXOS Vendor Invoice Management
Norikkon APay Center
170 MarkView Financial Suite
Ebydos
Bassware
Brainware
SAPERION InBound Center
Custom Ledger solution
Custom Programming-
Park Invoices and route workitems for further
processing
Post Invoices and route exceptions
47
48. Content Management - Direct Release
Release
ARCHIVE a.k.a.
Content Server
Jukebox CAS Storage
Centera
48
49. Standard Release – Using XML or TXT files
and ABAP
XML
ABAP’s
Network
Directory
s’
Release
AP
ARCHIVE a.k.a. AB
Content Server
Jukebox CAS Storage
Centera
49
50. Integrated Release – RFC’s and Function
Modules
1 RFC calls FM to Create URL
2 Document is stored
3 Extracted Data is passed back
into specific FM’s to create
Function workitems or post documents
RFC Modules
EMC or Net Apps
H
TT
Release HTTPS
PS
Jukebox
ARCHIVE a.k.a.
Content Server
50
51. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
52. Template Case Study – Communications
Industry
Situation:
Customer receives 10,000 returned billing statements from the USPS
per month. 1.5 FTE were 4 months behind in processing the return mail
and updating the system that the billing address was wrong. So
statements were continually going out as undeliverable.
Pain:
Postage cost alone were running at $4,000 a month
Labor cost to process the return mail 1.5 FTE
Missed opportunities for Customer Service to obtain current address
information from the customer when they had them on the phone
Solution: Template OCR
Scanned in the first page of each statement and read the customer
account number and then released this information in a text file. A
process then reads this file and updated the billing system (Non-SAP)
Return Mail is processed daily within 15 minutes on average
52
54. Rules Based Case Study – Chemical
Industry
Situation:
2 FTE are manually entering invoice header “indexing” information
and invoices are manually posted 27 FTE’s, average invoice per
FTE / 923 invoices/month
Pain:
Wanted to reduce the number of days down from 6 days, reduce
resources, and reduce the potential for Ergonomic injuries
Manual data entry, missed discounts
Goal:
Increase discounts taken, increase productivity, reduce by 3FTE’s,
create ability to further automate workflow exception handling
using OCR
Solution: Rules Based OCR
Vendor invoices are scanned and run through the OCR software,
this information is then used to automatically post the top 20
vendors, and route all other invoices to AP processors.
54
55. Rules Based Case Study – Chemical
Industry
Results
Reduced by 3 FTE’s the month of implementation – from 27 FTE’s to 24 FTE’s
1 FTE in document control
2 FTE’s in invoice posting
Improved ability to take term discounts
% taken YTD:
Jan 38%, Feb. 45%, March 35%, April 40%, May 53%, June 46%, July
55%
Invoices processed per FTE reached 1550 in July, from 923 pre-implementation
Reduced data entry by exploiting OCR and creating programming to post
invoices to purchase orders automatically
Targeted high volume, suppliers for auto-post, then worked with suppliers to
ensure invoice criteria was met, then templates were created.
7% of invoices are auto-posted, without human intervention. Approximately
1600 invoices a month. Invoices with errors are routed to processors.
Post implementation - additional suppliers have been identified for auto-posting
and will be added
55
56. OCR Benefits in Accounts Payable
Processing Costs
Reduced Labor
Invoice Processing
Invoice Sorting
Filing
Reduced Paper Handling
No lost invoices
Can be accessed by multiple users and locations simultaneously
Storage space for of physical documents reduced
Increase Processing Speed
Invoices can be scanned and processed on the day that they are
received increasing visibility for management.
Early payment discounts can be utilized
Increased data accuracy
Fewer data entry mistakes are made
Increased Accessibility/Availability Productivity and Efficiencies
Access to documents is Instant
Information sharing is enhanced
56
57. ]
[ GRETCHEN LINDQUIST
ASUG INSTALLATION MEMBER
MEMBER SINCE: 1999 As IS To Be- Processing
What is OCR
Why should we use it
Document Structure, Types of Recognition
Template vs. Rules Based OCR
Flow, Classification, Extraction, and VRS Technology
[ GREG CAPPS
Integration and release ASUG INSTALLATION MEMBER
MEMBER SINCE:1998
Case Study
[ISRAEL OLIVKOVICH
SAP EMPLOYEE
Wrap up and Questions
MEMBER SINCE: 2004
58. Most OCR solutions include
Administration Client- Design or Project Builder client
Setup Batch structure and Setup project structure
process Setup classification and
Setup Release details extraction schemes
Document Separations Train the project
Scanner Configuration Define validation rules
Users Authorizations Design Validation screen
User – Solution layout
Developer/Administrator Test the project
User – Solution Developer
58
59. Leading Practice
Start Slowly- Minimum disruption to existing process.
Some companies started with 1 vendor a day
Some companies start with the top 20% of vendors
You still need to pay your bills
New technology- employee need time to adjust and
embrace. Using the light switch approach may turn
employees off.
Start Imaging before OCR and Process Automation
Constant Improvement the process will constantly
need to be monitored, Always tweaking
59
60. Leading Practice
Notify your vendors-
Tell them that there documents will be processed via OCR
Get them to work with you.
Single line - line items
Stop sending invoices on blue paper with balloons
Clearly identify the information that you need to see
Consolidate Vendors
Use this as a opportunity to consolidate vendors, if they are
not going to help with the above, then consolidate
Clean up Vendor Master records
Vendor Address and Phone numbers will be used daily for
vendor look up, make sure the information is correct.
Set up process to correct errors efficiently
60
61. What do I need to get started?
Scanner for Document imaging
Scanner that is supports VRS (Virtual ReScan)
OCR Software solution
Rules based OCR solution
Content Management solution
SAP Content Server
PBS ContentLink with EMC Centra, NetApp Filer, and
DR
OpenText (IXOS), OnBase, IBM, Documentum, or
FileNet etc
Release to SAP system
Automatic Posting or further processing within SAP
Workflow
61
63. ]
Thank you for participating.
Please remember to complete and return your
evaluation form following this session.
For ongoing education on this area of focus, visit the
Year-Round Community page at www.asug.com/yrc
[ SESSION CODE: 0403
John Walls
Verbella CMG, LLC
John.walls@verbellacmg.com
484-888-2199
www.verbellacmg.com
63
Editor's Notes
The system also includes a tightly integrated connection to your ERP and host systems. This lets you apply business rules and complex validation routines, and make your captured data available immediately.
For each group data is extracted using learn-by-example technique Classic learn-by-example approach where you start with a fixed model of the world as we see it and then present real examples to teach the system how things vary in real-life So, in this case we use a model of the syntax of invoice data within a field group Syntax applies to all invoices we will see in the real world, e.g. 3 examples The variation between each example, known as the semantics, is learnt by presenting lots of examples of real world invoices If you present enough you’ll cover all possible variations Once semantics are learnt they’re held in a configuration file called a knowledgebase
Document imaging technology can vastly improve operations in accounts payable departments. The benefits are substantial.