EMC Captiva - The Power of Intelligent Document Recognition

3,699 views
3,544 views

Published on

Captiva presentation

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,699
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
113
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Title Month Year
  • Title Month Year
  • Captiva’s intelligent capture solutions capture information from a wide variety of file format and document types. The Captiva family helps you capture business critical information from paper, fax, and electronic data sources and transform it into business-ready content suitable for processing by enterprise applications. You’ll easily automate the processing of billions of documents annually—quickly and accurately converting their contents into information that is usable for all enterprise business processes in a timely and cost-effective manner. At a high level, most intelligent capture processes have five steps—starting with capturing a document, identifying what type of document was captured, extracting key information from the document based upon the document type, ensuring that the data has been correctly extracted and is accurate, and delivering it to business processes or content repositories. (Note to Presenter: Click now in Slide Show mode for animation.) Capture involves much more than simply digitizing paper documents using a high-speed scanner. Increasingly, documents must be captured from a variety of different sources, and captured from anywhere within an enterprise—from branch offices and regional scanning centers, or ad hoc capture by field agents. (Note to Presenter: Click now in Slide Show mode for animation.) After being scanned and enhanced, advanced technologies are applied to identify document types. In some cases, documents are readily recognized based on physical appearance, especially with structured forms. In other cases, documents are identified based on the common text, such as with legal contracts. (Note to Presenter: Click now in Slide Show mode for animation.) Different document types have much different requirements for data extraction. Some documents only require simple indexing for quickly finding documents within repositories; for example, other documents have much more advanced requirements, essentially transforming all of the information on a paper document into electronic data, such as for forms, invoices, or other transactional documents. (Note to Presenter: Click now in Slide Show mode for animation.) It is especially critical that the data in documents that drive important business processes is validated. Inaccurate data can cause extremely costly problems if not found until mistakes are made or incorrect business decisions are executed. Captiva capture solutions feature database lookups and business rules to ensure that data is accurate before it is passed to the next step in the business process (Note to Presenter: Click now in Slide Show mode for animation.) Finally, Captiva provides an integrated delivery to content repositories and business applications that ensures that both images and extracted data are successfully delivered to the back-end system, in the right location, with the right business processes triggered. The intelligent capture process can be fully customized to meet individual customer requirements and includes options for distributed capture, sophisticated intelligent document recognition, and integration into larger enterprise applications that leverage service-oriented architectures. For example, by scanning all documents within its legal department, a leading pharmaceutical company has achieved significant labor productivity improvements among its high-value staff and reduced costs for physical storage space, shipping costs, and paper suppliers. The company achieved a return on investment of 113 percent with a payback period of 18 months. After having the system in place for three years, the company has saved an estimated $17.9 million by scanning in all documents.
  • A key element of Captiva’s Intelligent Capture suite is it’s Intelligent Document Recognition technologies – or IDR. These technologies add advanced technologies to dramatically enhance the ability to capture, organize, and transform any type of document into usable business data. Captiva advances the technology in three key areas: classification, extraction, and validation. [click] Classify As document capture is increasingly implemented across large organizations, multiple types of documents are being captured. With traditional capture, individual capture processes or time-consuming document preparation is required to capture multiple document types. Bar code labels must be added to documents or separator sheets must be inserted in between documents to help the capture system understand when a new document is starting. With Captiva’s Intelligent Classification technologies, documents can be identified based on their physical appearance – great for structured forms – or by the text content within the document – which is required for less structured documents, such as invoices or even legal documents. Captiva’s IDR technologies dramatically reduce the work required to prepare documents to be captured, significantly reducing the costs associated with capturing documents and speeding the transformation from paper documents into business-ready information. [click] Extract Once documents are classified, data is typically extracted from the documents. In some cases, this is achievable using traditional capture techniques, either with data keyers or by extracting data from known areas of the document, such as the upper right-hand corner. As more and different types of documents are being captured, the exact location of the information isn’t always known. It could be an invoice, where the data may be in different locations, depending upon the vendor, or it could be a piece of correspondence, where the address block could be anywhere. Captiva’s IDR technologies allow customers to leverage both high-speed and accurate zonal extraction techniques to extract data from known, structured document types and highly flexible freeform extraction technology to extract information, regardless of where it is located. Table-reading extraction also enables extraction of data from complex billing documents. [click] Validation The third important component of IDR is validation. Our customer tell us that the costs of finding inaccurate information later in the process are prohibitive, so as a best practice, we encourage our customers to take advantage of several forms of validation to ensure accurate information is captured and delivered to back-end systems. This makes documents more findable, it makes your business processes more reliable, and of course, it saves time and money. Captiva features validation of data against other data sources, such as ERP or other databases and it enables organizations to compare data against business rules to ensure that it is captured accurately and meets expectations. Together, Captiva’s IDR technologies advance document capture far beyond traditional capture solutions, providing far more value to customers by addressing many more applications and providing better transformation of documents into usable information. Title Month Year
  • Captiva Dispatcher benefits: Streamline the flow of data: Captiva Dispatcher is a technical module that can be plugged into the Input Accel document capture platform. Dispatcher processes scanned images coming from Input Accel and can handle forms, invoices, checks, explanations of benefits (EOBs), and loan documents….either single and multi-pages. Based on multiple innovative technologies, Dispatcher automatically sorts out the information and extracts key business information. EMC Captiva also delivers Dispatcher as an API module that can be called from any other document capture or third system (for example, Dispatcher can be called by Web Services through Global 360). Reduce scanning preparation time and cost: Some document capture systems are still requiring bar codes or separators to detect folders, documents types, and single and multi-pages, which consumes preparation time to manually insert the correct separators. Manually pre-sorting is an inefficient process prone to human error, while Dispatcher based on multiple classification thresholds can automatically detect the correct document type. Dispatcher can also handle portrait/landscape automatic detection as well as upside-down format or back-to-front documents. Reduce manual data entry and typing errors: Once Dispatcher detects the document type, business information, such as invoice data for Accounts Payable automation processes or patient data for EOBs applications, is extracted. Business data is validated with specific business rules and therefore is ready to be imported into an Enterprise Resource Planning (ERP), Customer Relationship Management (CRM) or third system with limited human intervention and error risk. Automatic routing and data extraction guarantee that documents are not lost, damaged, or forgotten about, as can be the case in a manual process. This in turn ensures compliance and enhances customer relationships. Improve return on investment for transactional processes: Companies invested in workflow, Business Process Management (BPM), ERP, CRM have realized that they can now increase their productivity and impact their global return on investment by providing an intelligent document capture system able to route and classify their documents’ flow. Scanning and classifying documents at the beginning of the process and delivering the images immediately to the appropriate repository is probably the standard example of Captiva Dispatcher’s impact on the Enterprise Content Management (ECM) system.
  • EMC Transactional Content Management July 2008 Structured documents – these are document types where data is always in the same area or region of the page. This document type usually requires zonal OCR or forms processing for highly complex forms such as mortgage applications, credit applications, etc. Examples of these types are address forms, health claim forms, benefit forms, tax forms, etc. A typical product mix to handle this document type would be InputAccel or InputAccel with FormWare for highly complex forms, and Dispatcher to identify them. Semi structured documents – these are document types where data required from the page is the same but varies in location from one vendor to another. This document type usually requires free form technology to find the data in question and extract/validate them from other systems eventually triggering transactions. Examples of these types are invoices, purchase orders, shipping documents, bill of lading, phone bills, etc. A typical product or configuration would be InputAccel for Invoices. Unstructured documents – these are document types where data or information is in the page but not always in the same area. This document type usually requires conversion of text into electronic format such as PDF or text recognition could be used to identify what the document is all about. Examples of these document types are correspondence and letters. Techniques: Global Image Analysis - Dispatcher™ uses a completely automatic learning process (“fuzzy logic” approach) for unlimited document types, building dynamically a knowledge base. This method does not rely on being able to read text data from the document but instead analyses the significant structural elements of the document, making it completely language independent! HPA - An HPA is defined manually by placing anchors on the graphical zones that are specific to a document in order to discriminate between documents. This technology should be applied when there is a high variability of documents within the same template. For example, in the case of documents such as cheques, it is not useful to discriminate too much by creating one template per bank if it is only necessary to identify that these documents are cheques, regardless of the issuing banks. Keyword - To classify documents based upon the text they contain and not according to their visual aspect or similarity with the template. Based on dictionaries of keywords often associated to the company document referential, Dispatcher™ reads the information on the document with specific OCR engines and identifies the type of incoming mail. Text Matching - New classification technology dedicated to unstructured documents. Easy to implement and set up you can on fly manage and control unstructured document classification. The objective is to extract the complete text and to compare sentences and characters sequence between documents. Therefore you can easily classify legal documents which can have different lay out or design but legal text will be exactly the same. This approach is unique on the market today and help our customers to optimize their unstructured information process. Mortgage, Legal application, HR…even financial services can get benefits of the Text Matching technology. This feature will be included into Dispacther for the 5.0 release Q2 08. Handwritten - Handwritten document is really different from others. Because of the algorithms of the “fuzzy logic” and of the learning base it is quite easy to distinguish the lay out of a handwritten document.
  • Note to Presenter: View in Slide Show mode for animation, and then slowly click three times. Batch/Doc folders classification: Separating out documents is automatically based on the layout analysis or specific keywords. Related to the classification technologies, Dispatcher can naturally separate images to create document folders without separators or bar codes. The benefit here is that users do not have to manually sort and prepare documents prior to scanning. Dispatcher combines graphical analysis and text analysis to define a “master” document type as the “document breaker” of the batch: Graphical analysis: Dispatcher refers to its learning base, ( i.e., the graphical analysis of the recurring information). Some documents are defined as natural separators. For example, when Dispatcher detects page 1 of Form 1, it is a new document set. Text analysis: Detection during classification of patient folder or invoice number. Dispatcher can break a batch into a document set including multiple documents. As soon as Dispatcher detects a new patient folder or invoice number, Dispatcher will create a new document set. In the example above, document sets are broken out into a logical set when a document is recognized as a given template. Doc Set 1 and 2 are from the same patient and the pages that follow the top page are attachments that are associated with the identified template.
  • Title Month Year
  • Introduce Dispatcher methods classification. Full-page Image Based analysis : For recurrent information, looking for the general lay out design – fuzzy logic approach. Hand-Precision Word Anchors: For recurrent information, looking for local detail on the lay out design as logos, or specific document area. . Handwritten Analysis : Automatic correspondence detection Full-Page Text Based Analysis : For non recurrent information. Looking for key words to classify the doc type Other: Introduction to the coming 5th classification technologies (Text Matching). It would be a powerful method for unstructured information providing verbiage comparison. No need of thesaurus and no business knowledge required to handle unstructured document information.
  • Dispatcher™ uses a completely automatic learning process (“fuzzy logic” approach) for unlimited document types, dynamically building a knowledge base. This method does not rely on being able to read text data from the document but instead analyses the significant structural elements of the document, making it completely language independent! New in 6.0: More image capacity for auto-learning up to 40,000 images for best accuracy
  • An HPA is defined manually by placing anchors on the graphical zones that are specific to a document in order to discriminate between documents. This technology should be applied when there is a high variability of documents within the same template. For example, in the case of documents such as cheques, it is not useful to discriminate too much by creating one template per bank if it is only necessary to identify that these documents are cheques, regardless of the issuing banks.
  • To classify documents based upon the text they contain and not according to their visual aspect or similarity with the template. Based on dictionaries of keywords often associated to the company document referential, Dispatcher™ reads the information on the document with specific OCR engines and identifies the type of incoming mail. New in 6.0: Faster keyword classification when using “fast mode”
  • New classification technology dedicated to unstructured documents. Easy to implement and set up you can on the fly manage and control unstructured document classification. The objective is to extract the complete text and to compare sentences and characters sequence between documents. Therefore you can easily classify legal documents which can have different lay out or design but legal text will be exactly the same. This approach is unique on the market today and help our customers to optimize their unstructured information process. Mortgage, Legal application, HR…even financial services can get benefits of the Text Matching technology. This feature will be included into Dispatcher for the 5.0 release Q4 07.
  • Title Month Year
  • Title Month Year Two Major Technologies: Template: locate which fields to capture, work well when the layout of forms is the same or where clear identifiers define the format. Used for recurring information. Free Form Approach: based on keywords and text analysis to catch out the data. You extract the same information than a template used but without any layout analysis. Used for non recurring information. IA data extraction At a basic level images are scanned and index operators key information into index fields based on image data. IA provides more advanced techniques which include the following. Zonal OCR – At setup time, an admin can specify where on a document to apply OCR (Optical Character Recognition). For example, a customer may want to extract a loan document number from a page. Rather than keying this information, IA applies OCR to read the loan number and have it pre-populate an index field. Dispatcher support zonal OCR as well. OCR Rubber Banding – IA supports full page OCR. As a document is being indexed, an operator can select a certain location on a document image and extract the OCR results. For example, rubber banding around the SSN on a page will take the OCR results and insert it into the SSN index field on screen. This provides a quick and easy way to extract data from a document without manually keying. Dispatcher extraction capabilities -Performs both zonal OCR and free form OCR extraction. Free form OCR – looks for keywords on a document image and once it locates the word, applies the extraction rules. For example, “look for the keyword P.O. and once located look below P.O to find the purchase order number”. This provides flexibility around being able to extract data from a semi-structured document. Table Extraction – Supports the extraction of line item details on a document. For example an invoice. Dispatcher Table Extraction will OCR the data and based on setup rules defined will extract the line item details (e.g. Quantity, Description, Amount) into Disptacher index fields.
  • New in 6.0: New 2D barcode recognition for PDF-417 and DataMatrix
  • New in 6.0: Updated Nuance Scansoft OCR engine improves classification and extraction accuracy
  • Title Month Year
  • Title Month Year
  • InputAccel compatibility enhancements The major theme of the Dispatcher 6.0 release is compatibility with InputAccel 6.0. An additional theme is new and updated recognition engines, which we will talk about later. Common sample Dispatcher reports accessible from within InputAccel Admin Console Dispatcher statistics can now be reported on from within the IA Admin Console, instead of having to run a separate program. A few commonly used Dispatcher reports are provided, which pull from the InputAccel database. This allows for easier reporting of both IA and Dispatcher statistics. Custom Dispatcher reports can be developed from within InputAccel Admin Console using Crystal Reports Since Dispatcher statistics are now stored on the IA database in addition to the separate Dispatcher database that exists today, you can use this data to develop custom Dispatcher reports using the Crystal Reports report generator included with IA 6.0. This allows for ultimate flexibility for reporting on exactly what you want.
  • Classification Edit and Validation user interfaces mimic IndexPlus user interface for logging in and selecting batches The user interfaces for logging in and selecting batches for Classification Edit and Validation now look very similar to those used by Scan and Index in InputAccel.
  • New check reading engine for U.S. and France provides recognition of CAR, LAR, MICR/CMC7 codeline, signature presence, payee name, check number, and check date Dispatcher also now provides a new check reading engine that reads various fields from U.S. and French checks, including CAR, LAR, MICR/CMC7 codeline, signature presence, payee name, check number, and check date. With this new check reading engine, you no longer have to define zones, fields and keyword rules for checks, nor do you have to spend time testing different recognition engines for best results. This engine does it all for you because it specializes in reading checks. New in 6.0: User productivity Improvements in character repair behavior in Dispatcher Validation Classification Edit pre-indexing interface provides consistent feel with Validation interface, including addition of character repair
  • Title Month Year
  • Note to Presenter: View in Slide Show mode for animation. I’d like to wrap up with a summary of what we’ve covered today… First, we talked about the key business drivers for organizations taking on initiatives to eliminate paper and manual processes: Paper is difficult to storage and manage Manual processes are slow, expensive, and error-prone Information silos create compliance risk Legacy imaging solutions are not meeting business requirements Note to Presenter: Click now in Slide Show mode for animation. Secondly, we’ve covered the four capabilities within intelligent capture: Capture —Capture from anywhere within the enterprise using a variety of input methods (scanners, MFPs, e-mail) Classify —Automatically classify all documents using sophisticated document recognition technologies Extract and validate —Automatically extract and validate data from all documents Delivery —Integrate with all systems throughout the enterprise Note to Presenter: Click now in Slide Show mode for animation. There are five reasons why customers have selected EMC for their needs: EMC has the industry’s only complete, end-to-end offering, including document capture and classification, a complete business process suite, collaboration, enterprise report management, content archiving services, records/retention management, information rights management, and much more. EMC is recognized as the market leader by IDC, Gartner, and The 451 Group as the leader in enterprise content management. EMC provides a proven, scalable, fully unified architecture that has been utilized by more than 15,000 customers. The architecture allows EMC to process all content types and processes. EMC provides both a platform and solutions approach in the areas of… Solution examples (loan origination, new account enrollment, etc.) Partner applications (Accounts Payable, contract management, etc.) Partner extensions (Adobe, iLog, etc.)
  • Title Month Year
  • EMC Captiva - The Power of Intelligent Document Recognition

    1. 1. The Power of Intelligent Document Recognition Using EMC Captiva Dispatcher
    2. 2. Agenda Dispatcher Overview
    3. 3. EMC Captiva Intelligent Capture <ul><li>Capture all of your paper documents and transform these documents into electronic images and business data </li></ul><ul><ul><li>Support centralized and distributed scanning environments </li></ul></ul><ul><ul><li>Enable digital offices throughout your enterprise </li></ul></ul><ul><ul><li>Identify all documents and automate data capture from business documents </li></ul></ul><ul><ul><li>Provide immediate access to your documents to both individuals and processes </li></ul></ul><ul><li>Invoice Number </li></ul><ul><li>Vendor Name </li></ul><ul><li>Purchase Date </li></ul><ul><li>Subtotal </li></ul><ul><li>Grand Total </li></ul><ul><li>Payment Terms </li></ul><ul><li>10010 </li></ul><ul><li>Acme Products </li></ul><ul><li>30 January 2008 </li></ul><ul><li>$ 6,014.81 </li></ul><ul><li>$ 6,025.88 </li></ul><ul><li>Net 30 Days </li></ul>Capture Classify Extract Validate Deliver
    4. 4. EMC Captiva Intelligent Capture <ul><li>Invoice Number </li></ul><ul><li>Vendor Name </li></ul><ul><li>Purchase Date </li></ul><ul><li>Subtotal </li></ul><ul><li>Grand Total </li></ul><ul><li>Payment Terms </li></ul><ul><li>10010 </li></ul><ul><li>Acme Products </li></ul><ul><li>30 January 2008 </li></ul><ul><li>$ 6,014.81 </li></ul><ul><li>$ 6,025.88 </li></ul><ul><li>Net 30 Days </li></ul>Capture Classify Extract Validate Deliver Classify Extract Validate Sophisticated image- and text-based classification tools to identify documents without manual preparation Zonal and intelligent freeform data extraction to transform all documents into electronic data Effectively control business processes by validating data for correct recognition and accuracy Intelligent Document Recognition
    5. 5. Captiva Dispatcher Benefits <ul><li>Streamline the flow of data into enterprise applications </li></ul><ul><li>Reduce scanning preparation time and cost </li></ul><ul><ul><li>Eliminate manual document preparation and data entry </li></ul></ul><ul><ul><li>No need for bar code or page separators </li></ul></ul><ul><li>Reduce manual data entry errors </li></ul><ul><ul><li>Automated data validation </li></ul></ul><ul><li>Significant return on investment </li></ul><ul><ul><li>Accelerate document routing </li></ul></ul><ul><ul><li>Accelerate resolution of disputes </li></ul></ul><ul><ul><li>Increase electronic document management productivity </li></ul></ul>Automated process to capture, classify, route, index, and extract information to provide data for business transactions and images for archiving/storage
    6. 6. Advanced Document Identification <ul><li>Key Benefits </li></ul><ul><li>Reduce document preparation time </li></ul><ul><li>Index and route document to the appropriate business process </li></ul>Semi-Structured Documents Invoices Checks POs Unstructured Documents Legal Contracts Patient records Structured Documents Forms Tax returns Global Image Analysis High Precision Anchors Global Image Analysis High Precision Anchors Keyword Analysis Handwritten detection Keyword Analysis Text Matching Analysis
    7. 7. Batch Management – Innovative Techniques Doc Set 4 Doc Set 1 Claim folder: 0045128 Doc Set 2 Doc Set 3 Claim Folder: 0045670 Advanced document identification for batch processing
    8. 8. Agenda
    9. 9. Classification Technologies <ul><li>Global Image Analysis </li></ul><ul><ul><li>Automatic learning and identification of documents using graphical templates </li></ul></ul><ul><li>Local Image Analysis </li></ul><ul><ul><li>Zonal, graphical identification of documents </li></ul></ul><ul><li>Keywords Analysis </li></ul><ul><ul><li>Identification of documents based on keyword </li></ul></ul><ul><li>Text Matching Analysis </li></ul><ul><ul><li>Identification of documents based on text blocks </li></ul></ul><ul><li>Handwritten Detection </li></ul><ul><ul><li>Identification of documents based on handwriting </li></ul></ul>
    10. 10. Classification Technologies
    11. 11. Standard Classification Global Image Analysis <ul><li>Layout/graphical analysis to determine document type </li></ul><ul><li>“ Fuzzy-Logic” algorithm independent of language and format </li></ul><ul><li>Automatic learning system to dynamically build knowledge base </li></ul><ul><li>Feed Dispatcher with recurring images, and document families (templates) are automatically created </li></ul><ul><li>Provide large image samples to increase Dispatcher efficiency </li></ul>
    12. 12. High Precision Anchors Local Image Analysis <ul><li>Specify local area (such as a logo or title) to determine document type </li></ul><ul><li>High Precision Anchors concept </li></ul><ul><li>Split document families into sub­families to define a specific process </li></ul><ul><li>Local Image Analysis complements Global Image Analysis when documents vary within same family/template </li></ul>
    13. 13. Keyword Classification <ul><li>Keyword match to determine document type </li></ul><ul><li>Use a full text engine to extract document information </li></ul><ul><li>Match the text extraction with business dictionaries to classify your information </li></ul><ul><li>Tune your own keywords rules using regular expressions </li></ul><ul><li>Classification method for free-form/non-templatized, non­recurring documents </li></ul>
    14. 14. Text Matching Classification <ul><li>Determine document type when documents have no unique layout or keywords </li></ul><ul><li>Use a full text OCR engine to extract and match document information </li></ul><ul><li>Learn a new document the first time – one image needed </li></ul><ul><li>Minimal configuration settings required </li></ul><ul><li>Can increase the classification rate on unstructured documents in Dispatcher by up to 40% </li></ul>Property Insurance. Borrower shall keep the improvements now existing or hereafter erected on the Property insured against loss by fire, hazards included within the term &quot;extended … OCR
    15. 15. Technology Flow in Dispatcher 4. Business rules <ul><li>1. Global/local image classification (55% to 90%) </li></ul><ul><ul><li>Recognizes a document which looks like another one seen before (global) or that contains a specific pattern, like a logo (local) </li></ul></ul><ul><ul><li>Unique software to automatically build up to 10,000 templates </li></ul></ul><ul><ul><li>Speed of classification 20 to 50 pages/sec </li></ul></ul><ul><li>2. Keywords text classification (5% to 20%) </li></ul><ul><ul><li>Recognizes a document which contains a specific set of keywords </li></ul></ul><ul><ul><li>Multi engine OCR on header and footer </li></ul></ul><ul><ul><li>Optimized reading zone : up to 2 pages/sec </li></ul></ul><ul><li>3. Text Matching text classification (15% to 40%) </li></ul><ul><ul><li>Recognizes a document containing similar sequence of characters, i.e. standard letter </li></ul></ul><ul><ul><li>Automatic learning on the fly </li></ul></ul><ul><ul><li>More CPU intensive : 0.5 pages/sec </li></ul></ul>Property Insurance. Borrower shall keep the improvements now existing or hereafter erected on the Property insured against loss by fire, hazards included within the term &quot;extended … OCR Lender may require Borrower Lender may require Borrower Lender may require Borrower library If not If not Enhanced with
    16. 16. Agenda
    17. 17. Intelligent Data Extraction <ul><li>Extract critical business data based on document type </li></ul><ul><ul><li>Simple indexing to extensive data extraction </li></ul></ul><ul><li>Extract using zonal and free-form techniques </li></ul><ul><ul><li>Enhanced extraction reduces manual costs and enables faster, more accurate business processes </li></ul></ul>Zonal Extraction Freeform Extraction
    18. 18. Extraction Technologies <ul><li>Recognition technologies </li></ul><ul><ul><li>OCR, ICR, mark sense, 1D barcoding </li></ul></ul><ul><ul><li>New in Dispatcher 6: 2D barcoding and checkreading </li></ul></ul><ul><ul><li>Nuance Scansoft, Transym TOCR, Abbyy FineReader, Oce Recostar, Parascript CheckPlus, Pegasus Barcode Xpress </li></ul></ul><ul><li>Zonal recognition </li></ul><ul><ul><li>Use zones and anchors to locate and extract data that occurs in a known area of the page </li></ul></ul><ul><li>Freeform recognition </li></ul><ul><ul><li>Use keywords to locate and extract data when you don’t know where it occurs on the page </li></ul></ul><ul><li>Table recognition </li></ul><ul><ul><li>Use freeform recognition for “array fields” or tables typically found in invoices and EOBs </li></ul></ul>
    19. 19. Recognition Technologies <ul><li>Technologies </li></ul><ul><ul><li>Machine print (OCR) </li></ul></ul><ul><ul><li>Hand print (ICR) </li></ul></ul><ul><ul><li>Checkboxes/bubbles </li></ul></ul><ul><ul><li>Barcodes </li></ul></ul><ul><li>Engines </li></ul><ul><ul><li>Scansoft OCR / ICR / Barcode </li></ul></ul><ul><ul><li>Transym TOCR </li></ul></ul><ul><ul><li>Abbyy FineReader </li></ul></ul><ul><ul><li>Oce Recostar Zonal </li></ul></ul><ul><ul><li>Oce Recostar Full Page </li></ul></ul><ul><ul><li>Parascript CheckPlus </li></ul></ul><ul><ul><li>Pegasus Barcode Xpress </li></ul></ul><ul><ul><li>EMC Engines </li></ul></ul><ul><ul><li>Multi-engine voters </li></ul></ul>
    20. 20. Zonal Recognition <ul><li>Use zones and anchors to locate and extract data that occurs in a known area of the page </li></ul>
    21. 21. Freeform Recognition <ul><li>Search for keywords </li></ul><ul><li>Search for targets </li></ul><ul><li>Select the correct targets </li></ul>
    22. 22. Freeform Recognition – Keywords <ul><li>Keywords you want to find in a document can be designed as </li></ul><ul><ul><li>Constants </li></ul></ul><ul><ul><li>Regular expressions </li></ul></ul><ul><ul><li>Field-specific types </li></ul></ul><ul><ul><ul><li>These are compound types containing multiple constants and regular expressions, stored in a portable file. </li></ul></ul></ul><ul><ul><ul><li>Benefit: portable file is easily re-usable in other Dispatcher projects. </li></ul></ul></ul><ul><li>Auto-generation of keywords </li></ul><ul><li>Auto-generation of regular expressions for formats </li></ul>
    23. 23. Freeform Recognition – Targets <ul><li>Identify targets by direction/orientation </li></ul>Key Target
    24. 24. Freeform Recognition – Targets <ul><li>Identify targets by distance </li></ul>
    25. 25. Freeform Recognition – Targets <ul><li>Identify targets by priority </li></ul><ul><ul><li>All targets have same priority </li></ul></ul><ul><ul><li>Higher priorities for some targets </li></ul></ul>
    26. 26. Freeform Recognition Matching Keywords/Targets “ Invoice” “ Invoice #” “ Invoice Number” “ Invoice No.” “ Bill To” “ Bill To Address” “ Ship To” “ Ship To Address” “ Customer ID” “ Customer #” “ Customer No” ( ?d{3} ?) ?-? ?d{3} ?- ?d{4} <ul><li>WREN INVOICE </li></ul><ul><li>DOWN CORP. </li></ul><ul><li> 58 ROUTE 66 WEST – TOTOWA, NEW JERSEY 07522 </li></ul><ul><ul><ul><li>TEL: (998) 815-8100 – FAX (981) 8181-8101 </li></ul></ul></ul><ul><li>COMMERCIAL INVOICE Invoice No. 227628 </li></ul><ul><li>Customer No. CAPTIVA </li></ul><ul><li>Bill To: Ship To: </li></ul><ul><li>CAPTIVA WHOLESALE CAPTIVA WHOLESALE </li></ul><ul><li>VENDOR #6535-00 11600 MIRA LOMA DR </li></ul><ul><li>PO BOX 60622 SAN DIMAS, CA 92175 </li></ul><ul><li>SEATTLE, WA 98124-1622 </li></ul><ul><li>WREN INVOICE </li></ul><ul><li>DOWN CORP. </li></ul><ul><li> 58 ROUTE 66 WEST – TOTOWA, NEW JERSEY 07522 </li></ul><ul><ul><ul><li>TEL: (998) 815-8100 – FAX (981) 8181-8101 </li></ul></ul></ul><ul><li>COMMERCIAL INVOICE Invoice No. 227628 </li></ul><ul><li>Customer No. CAPTIVA </li></ul><ul><li>Bill To: Ship To: </li></ul><ul><li>CAPTIVA WHOLESALE CAPTIVA WHOLESALE </li></ul><ul><li>VENDOR #6535-00 11600 MIRA LOMA DR </li></ul><ul><li>PO BOX 60622 SAN DIMAS, CA 92175 </li></ul><ul><li>SEATTLE, WA 98124-1622 </li></ul>Search for keywords and targets Constants Regular Expressions <ul><li>Targets are ordered relative to the keywords </li></ul><ul><ul><li>By target orientation </li></ul></ul><ul><ul><li>By priority </li></ul></ul><ul><ul><li>By distance </li></ul></ul><ul><li>WREN INVOICE </li></ul><ul><li>DOWN CORP. </li></ul><ul><li> 58 ROUTE 66 WEST – TOTOWA, NEW JERSEY 07522 </li></ul><ul><ul><ul><li> TEL: (998) 815-8100 – FAX (981) 8181-8101 </li></ul></ul></ul><ul><li>COMMERCIAL INVOICE Invoice No. 227628 </li></ul><ul><li>Customer No. CAPTIVA </li></ul><ul><li>Bill To: Ship To: </li></ul><ul><li>CAPTIVA WHOLESALE CAPTIVA WHOLESALE </li></ul><ul><li>VENDOR #6535-00 11600 MIRA LOMA DR </li></ul><ul><li>PO BOX 60622 SAN DIMAS, CA 92175 </li></ul><ul><li>SEATTLE, WA 98124-1622 </li></ul>OCR Content Data Elements --- --- --- 227628 CAPTIVA WHOLESALE PO BOX 60622 CAPTIVA WHOLESALE 11600 MIRA LOMA DR --- --- CAPTIVA ---
    27. 27. Table Recognition <ul><li>Automatically extracts data from columns in invoices and EOBs </li></ul><ul><li>Recognizes columns with or without vertical lines </li></ul><ul><li>Recognizes relationships between column data </li></ul><ul><ul><li>Quantity * Unit Price = Extended Amount </li></ul></ul><ul><li>Supports single and multi-line rows </li></ul><ul><li>Extracts data using regular expressions and keywords </li></ul>
    28. 28. Agenda
    29. 29. Snapshot of Dispatcher 6
    30. 30. Dispatcher 6 Highlights <ul><li>Integrated with new InputAccel 6 Reporting </li></ul><ul><li>Ability to create custom Dispatcher Reports </li></ul><ul><li>Key Benefits </li></ul><ul><li>Reporting enables system monitoring and provides greater visibility into the system </li></ul>Tighter Integration into InputAccel Platform
    31. 31. Dispatcher 6 Highlights <ul><li>Log-in screen and batch selection consistent with new InputAccel 6 user interfaces </li></ul><ul><li>Key Benefits </li></ul><ul><li>Common user interfaces provides ease of use </li></ul>Tighter Integration into InputAccel Platform
    32. 32. Dispatcher 6 Highlights <ul><li>Check reading - courtesy and legal amount recognition (CAR/LAR) </li></ul><ul><li>Simple setup – zones, fields, and keywords for checks are not needed </li></ul>Check Reading Recognition <ul><li>Key Benefits </li></ul><ul><li>Enables classification and data extraction for check-reading applications </li></ul><ul><li>E.g. mailroom check sorting and routing, accounts receivable, and bank lockbox </li></ul>CAPTIVA SOFTWARE CORPORATION… 0025036 $47,721.39 FORTY-FIVE THOUSAND…. 00250360… YES Check # Courtesy Amount Legal Amount Payee Name MICR/CMC7 Signature Present?
    33. 33. Agenda
    34. 34. Key Intelligent Document Recognition Highlights <ul><li>EMC approach provides intelligent document recognition for all documents </li></ul><ul><li>High-speed graphic classification and zonal extraction for highly structured documents </li></ul><ul><li>Flexible, accurate, text-based classification and freeform data extraction for less structured documents </li></ul><ul><li>Unified development and administration simplifies development and maintenance </li></ul><ul><li>Significant cost reduction and process efficiency </li></ul><ul><ul><li>Eliminate manual document sorting </li></ul></ul><ul><ul><li>Increase automated data extraction </li></ul></ul><ul><li>Benefits of document classification and routing </li></ul><ul><ul><li>Organizing complex documents </li></ul></ul><ul><ul><li>Enabling routing for digital mailrooms </li></ul></ul><ul><li>Benefits of data extraction </li></ul><ul><ul><li>Reduced costs associated with data keying and document indexing </li></ul></ul><ul><ul><li>Increased value for business process </li></ul></ul>
    35. 35. Intelligent Capture Recap <ul><li>Reasons to </li></ul><ul><li>Choose EMC </li></ul><ul><li>Industry’s only complete enterprise solution </li></ul><ul><li>Fiscal strength and viability </li></ul><ul><li>Recognized enterprise market leader </li></ul><ul><li>Proven, scalable, unified architecture </li></ul><ul><li>Platform and solutions approach </li></ul><ul><li>Key Benefits </li></ul><ul><ul><li>Complete ROI delivered within 12 months </li></ul></ul><ul><ul><li>Reduce document sorting and data entry labor costs by up to 90% </li></ul></ul><ul><ul><li>Reduce cycle times by over 75% </li></ul></ul><ul><ul><li>Save $1 per document to store paper documents electronically </li></ul></ul><ul><li>Intelligent document recognition capabilities </li></ul><ul><ul><li>Automatically classify all document types within an organization </li></ul></ul><ul><ul><li>Extract and data from structured and unstructured document types </li></ul></ul><ul><ul><li>Validate data to ensure accurate processing </li></ul></ul>
    36. 36. Get Involved with EMC CMA Communities <ul><li>Why should you join? </li></ul><ul><li>Collaborate and share best practices </li></ul><ul><li>Shape the direction of future EMC products </li></ul><ul><li>Network with innovators across the globe, 24/7 </li></ul>Join now by going to: community.EMC.com/go/ Documentum community.EMC.com/go/ SourceOne developer.EMC.com/ Documentum developer.EMC.com/ XMLtech community.EMC.com/community/labs/ d65

    ×