Scanning	
  and	
  OCR	
  the	
  Open	
  Source	
  Way	
  
Ian	
  Pope	
  -­‐	
  Epheso:	
  
“Our goal is to allow everyone to enjoy a fully
    featured document and data capture system that
    rivals the current industry leaders at a fraction of
                          the cost”




                                                                      By 2016, Open Source Software (OSS) will be
                                                                      included in mission-critical software portfolios
                                                                         within 99% of Global 2000 enterprises.*




*Source:	
  Gartner	
  Predicts	
  2011:	
  Open	
  Source	
  So7ware,	
  the	
  Power	
  Behind	
  the	
  
Throne,	
  23rd	
  Nov,	
  2010	
  
Legacy Advanced Capture Systems:
•  high costs – software and services
•  click charges (priced by volume)
•  confusing pricing options
•  difficult to configure and implement
•  thick client deployment
•  “bloat ware”
•  closed standards /vendor lock-in
Ephesoft is the industry’s only Java and
100% browser-based Advanced Capture
Solution that is Cloud ready out of the box.

...no need to install software on every
workstation. Ephesoft supports Firefox,
Chrome, Safari and Explorer.

Ephesoft is based on open standards and is
the first Advanced Capture Solution that can
run on Linux.
Scan or import documents from virtually any
source such as fax and e-mail with one
application

Process documents that are stored in the
repositories

Utilize any existing hardware whether it is a
departmental document scanner, high-speed
production scanner or MFP
Invoices   medical   mortgage insurance     government energy telecommunication
                                     HR ...
ed
A dvanc ion
          cat
 c lassifi ation
           ar
a nd sep gies
          olo
  techn ed
      includ




                   Learns hundreds of
                   documents within minutes
                   and sorts the documents
                   more efficiently
                   Separates documents by
                   identifying where a
                   document starts and ends
                   Allows operators to focus
                   on only the exceptions
Bar Code Reading
        Fixed Forms Extraction
Mark Sense and Handprint Recognition
           PDF + text output
 Free Form “unstructured” extraction
 Fuzzy Database document matching
   Cursive Handwriting Recognition



invoice	
  date	
           6/17/2008	
  
account	
  number	
     2-­‐1006-­‐475-­‐1	
  
reference	
  code	
          22177991	
  
service	
  code	
                954381	
  
amount	
                            29.34	
  
due	
  date	
                 7/7/2008	
  
Case	
  Study	
  Travelcard	
  
The	
  challenges	
  
•  Due	
  to	
  moving	
  to	
  another	
  office	
  building,	
  the	
  paper	
  archive	
  
   was	
  reconsidered	
  
•  Digital	
  archive	
  should	
  replace	
  paper	
  archive:	
  
     –  Reduce	
  space	
  (costs)	
  
     –  Documents	
  are	
  instantly	
  accessible	
  
•  SoluMon	
  should	
  be	
  embedded	
  within	
  current	
  organizaMon	
  and	
  
   current	
  staff	
  
•  SoluMon	
  should	
  be	
  scalable	
  without	
  any	
  extra	
  costs	
  
•  50	
  documents	
  (200	
  pages)	
  per	
  day,	
  about	
  40	
  different	
  
   document	
  types	
  –	
  invoices	
  (90%),	
  contracts,	
  applicaMon	
  forms	
  
   (request	
  forms	
  for	
  ordering	
  fuel	
  cards),	
  bank	
  statements	
  and	
  a	
  
   couple	
  of	
  standard	
  forms	
  including	
  a	
  form	
  to	
  authorize	
  
   payments.	
  
The	
  SoluMon	
  
•  Incoming	
  documents	
  are	
  scanned	
  with	
  a	
  
   muliVuncMon	
  device	
  and	
  picked	
  up	
  by	
  EphesoX	
  
•  EphesoX	
  classifies	
  the	
  documents	
  and	
  extracts	
  
   metadata	
  
•  Output	
  from	
  EphesoX	
  (pdf	
  and	
  metadata	
  file)	
  is	
  used	
  
   to	
  save	
  document	
  to	
  DMS	
  
•  Saved	
  documents	
  go	
  into	
  workflow	
  based	
  on	
  their	
  
   document	
  classificaMon	
  
The	
  Benefits	
  
•  Over	
  70%	
  of	
  documents	
  is	
  handled	
  automaMcally	
  
•  Remaining	
  30%	
  (mostly	
  variants)	
  are	
  handled	
  by	
  
   employees	
  aXer	
  only	
  a	
  few	
  hours	
  of	
  training	
  
•  No	
  addiMonal	
  investment	
  in	
  employees	
  or	
  
   peripherals	
  
•  All	
  documents	
  are	
  digital	
  enabling:	
  
    –  Quick	
  and	
  easy	
  access	
  
    –  Searching	
  documents	
  by	
  metadata	
  
    –  Workflows	
  (without	
  documents	
  gebng	
  lost)	
  
invoice	
  date	
           6/17/2008	
  

                             account	
  number	
     2-­‐1006-­‐475-­‐1	
  

                             reference	
  code	
          22177991	
  

                             service	
  code	
                954381	
  

                             amount	
                            29.34	
  

                             due	
  date	
                 7/7/2008	
  




CMIS




        Multipage PDF/TIFF




  XML
The	
  Challenges	
  
•  Australia’s	
  biggest	
  satellite	
  company,	
  also	
  providing	
  
   technical	
  services	
  to	
  the	
  telecoms	
  industry.	
  
•  1,000	
  staff	
  and	
  1,400	
  contractors	
  –	
  8	
  branches.	
  
•  MulMple	
  document-­‐driven	
  business	
  processes.	
  
•  Looking	
  for	
  an	
  ECM	
  soluMon	
  –	
  for	
  now	
  and	
  the	
  future	
  –	
  as	
  
   well	
  as	
  a	
  company	
  intranet	
  and	
  specifically	
  an	
  accounts	
  
   payable	
  invoice	
  approval	
  soluMon.	
  
•  Had	
  already	
  evaluated	
  MicrosoX	
  SharePoint	
  and	
  other	
  
   proprietary	
  products.	
  
The	
  SoluMon	
  
•  Zia	
  was	
  selected	
  to	
  provide	
  an	
  EphesoX	
  &	
  Alfresco	
  soluMon	
  
   for	
  Accounts	
  Payable.	
  
•  EphesoX	
  does	
  invoice	
  capture,	
  creaMng	
  PDF’s	
  and	
  
   metadata	
  tags	
  –	
  scanning	
  &	
  OCR.	
  
•  Using	
  the	
  CMIS	
  standard,	
  the	
  data	
  and	
  all	
  metadata	
  tags	
  
   are	
  exported	
  to	
  an	
  Alfresco	
  repository.	
  
• 	
  	
  	
  Once	
  in	
  the	
  Alfresco	
  system,	
  a	
  workflow	
  begins	
  by	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
triggering	
  an	
  email	
  in	
  Outlook	
  with	
  the	
  document	
  URL	
  to	
  the	
  
AP	
  Dept.	
  The	
  invoice	
  is	
  then	
  reviewed	
  and	
  commented	
  upon	
  
before	
  final	
  approval.	
  	
  
The	
  Results	
  
•Automated	
  the	
  accounts	
  payable	
  invoice	
  
  approval	
  process	
  
•Developed	
  flexible	
  workflows	
  
•Integrated	
  Alfresco,	
  EphesoX,	
  Pronto	
  ERP	
  
•Incorporated	
  OCR	
  extracMon	
  of	
  invoice	
  data	
  
•Reduced	
  invoice	
  processing	
  Mme	
  
•Improved	
  employee	
  producMvity	
  
Ephesoft includes:
•    feature rich, complete Advanced Capture Solution
•    easy to use and implement
•    browser based and cloud ready
•    one application for paper, email, fax documents
•    Web-based scanning application
•    document separation and classification
•    data extraction – fixed and unstructured
•    document and data release via XML, CMIS, and more
•    no volume counts on users or images

What is the cost?	
  
•  Zero License Cost
•  All features are included
•  Only cost is an annual Support/Maintenance Subscription
Ephesoft follows the trend started by Redhat,
Apache, Android, MySQL, Alfresco, ...

Ephesoft is the only open source Advanced
Document Capture solution available.

It is a true Enterprise Solution with full 24/7
enterprise support.
Ephesoft Enterprise Edition includes:

•    disaster recovery (if more than one Ephesoft server in use)
•    load balancing (if more than one Ephesoft server in use)
•    high availability (if more than one Ephesft server in use)
•    image enhancement
•    auto rotation based on text alignment
•    blank page deletion
•    professional level OCR
•    Browser-based scanning
Mountain West Financial Services Inc.

•    225 financial documents ‚trained‘
•    95% classification accuracy
•    Reduced staff by half
•    System implemented in weeks
•    4 million pages a year and growing
•    ROI in less than 6 months

•  Susan Hartsock, IT Supervisor, says „We‘re saving time,
   labor, and money. The solution is very intuitive and our
   staff loves it because it is fast and reliable. Ephesoft has
   been a sea change for us.“

•  A sample of joint Alfresco and Ephesoft customers
   include BSA, Trax, Heifer, Gilt Groupe, City and County
   of Denver, Colorado, etc.)	
  
Questions?

Thank you


Ian Pope
ian.pope@ephesoft.com
Skype: ianpopeportishead
+44 7411 461804

CASE-7 Scanning and OCR the Open Source Way

  • 1.
    Scanning  and  OCR  the  Open  Source  Way   Ian  Pope  -­‐  Epheso:  
  • 2.
    “Our goal isto allow everyone to enjoy a fully featured document and data capture system that rivals the current industry leaders at a fraction of the cost” By 2016, Open Source Software (OSS) will be included in mission-critical software portfolios within 99% of Global 2000 enterprises.* *Source:  Gartner  Predicts  2011:  Open  Source  So7ware,  the  Power  Behind  the   Throne,  23rd  Nov,  2010  
  • 4.
    Legacy Advanced CaptureSystems: •  high costs – software and services •  click charges (priced by volume) •  confusing pricing options •  difficult to configure and implement •  thick client deployment •  “bloat ware” •  closed standards /vendor lock-in
  • 5.
    Ephesoft is theindustry’s only Java and 100% browser-based Advanced Capture Solution that is Cloud ready out of the box. ...no need to install software on every workstation. Ephesoft supports Firefox, Chrome, Safari and Explorer. Ephesoft is based on open standards and is the first Advanced Capture Solution that can run on Linux.
  • 6.
    Scan or importdocuments from virtually any source such as fax and e-mail with one application Process documents that are stored in the repositories Utilize any existing hardware whether it is a departmental document scanner, high-speed production scanner or MFP
  • 7.
    Invoices medical mortgage insurance government energy telecommunication HR ...
  • 8.
    ed A dvanc ion cat c lassifi ation ar a nd sep gies olo techn ed includ Learns hundreds of documents within minutes and sorts the documents more efficiently Separates documents by identifying where a document starts and ends Allows operators to focus on only the exceptions
  • 9.
    Bar Code Reading Fixed Forms Extraction Mark Sense and Handprint Recognition PDF + text output Free Form “unstructured” extraction Fuzzy Database document matching Cursive Handwriting Recognition invoice  date   6/17/2008   account  number   2-­‐1006-­‐475-­‐1   reference  code   22177991   service  code   954381   amount   29.34   due  date   7/7/2008  
  • 10.
  • 11.
    The  challenges   • Due  to  moving  to  another  office  building,  the  paper  archive   was  reconsidered   •  Digital  archive  should  replace  paper  archive:   –  Reduce  space  (costs)   –  Documents  are  instantly  accessible   •  SoluMon  should  be  embedded  within  current  organizaMon  and   current  staff   •  SoluMon  should  be  scalable  without  any  extra  costs   •  50  documents  (200  pages)  per  day,  about  40  different   document  types  –  invoices  (90%),  contracts,  applicaMon  forms   (request  forms  for  ordering  fuel  cards),  bank  statements  and  a   couple  of  standard  forms  including  a  form  to  authorize   payments.  
  • 12.
    The  SoluMon   • Incoming  documents  are  scanned  with  a   muliVuncMon  device  and  picked  up  by  EphesoX   •  EphesoX  classifies  the  documents  and  extracts   metadata   •  Output  from  EphesoX  (pdf  and  metadata  file)  is  used   to  save  document  to  DMS   •  Saved  documents  go  into  workflow  based  on  their   document  classificaMon  
  • 13.
    The  Benefits   • Over  70%  of  documents  is  handled  automaMcally   •  Remaining  30%  (mostly  variants)  are  handled  by   employees  aXer  only  a  few  hours  of  training   •  No  addiMonal  investment  in  employees  or   peripherals   •  All  documents  are  digital  enabling:   –  Quick  and  easy  access   –  Searching  documents  by  metadata   –  Workflows  (without  documents  gebng  lost)  
  • 14.
    invoice  date   6/17/2008   account  number   2-­‐1006-­‐475-­‐1   reference  code   22177991   service  code   954381   amount   29.34   due  date   7/7/2008   CMIS Multipage PDF/TIFF XML
  • 17.
    The  Challenges   • Australia’s  biggest  satellite  company,  also  providing   technical  services  to  the  telecoms  industry.   •  1,000  staff  and  1,400  contractors  –  8  branches.   •  MulMple  document-­‐driven  business  processes.   •  Looking  for  an  ECM  soluMon  –  for  now  and  the  future  –  as   well  as  a  company  intranet  and  specifically  an  accounts   payable  invoice  approval  soluMon.   •  Had  already  evaluated  MicrosoX  SharePoint  and  other   proprietary  products.  
  • 18.
    The  SoluMon   • Zia  was  selected  to  provide  an  EphesoX  &  Alfresco  soluMon   for  Accounts  Payable.   •  EphesoX  does  invoice  capture,  creaMng  PDF’s  and   metadata  tags  –  scanning  &  OCR.   •  Using  the  CMIS  standard,  the  data  and  all  metadata  tags   are  exported  to  an  Alfresco  repository.   •       Once  in  the  Alfresco  system,  a  workflow  begins  by                                 triggering  an  email  in  Outlook  with  the  document  URL  to  the   AP  Dept.  The  invoice  is  then  reviewed  and  commented  upon   before  final  approval.    
  • 19.
    The  Results   •Automated  the  accounts  payable  invoice   approval  process   •Developed  flexible  workflows   •Integrated  Alfresco,  EphesoX,  Pronto  ERP   •Incorporated  OCR  extracMon  of  invoice  data   •Reduced  invoice  processing  Mme   •Improved  employee  producMvity  
  • 20.
    Ephesoft includes: •  feature rich, complete Advanced Capture Solution •  easy to use and implement •  browser based and cloud ready •  one application for paper, email, fax documents •  Web-based scanning application •  document separation and classification •  data extraction – fixed and unstructured •  document and data release via XML, CMIS, and more •  no volume counts on users or images What is the cost?   •  Zero License Cost •  All features are included •  Only cost is an annual Support/Maintenance Subscription
  • 21.
    Ephesoft follows thetrend started by Redhat, Apache, Android, MySQL, Alfresco, ... Ephesoft is the only open source Advanced Document Capture solution available. It is a true Enterprise Solution with full 24/7 enterprise support.
  • 22.
    Ephesoft Enterprise Editionincludes: •  disaster recovery (if more than one Ephesoft server in use) •  load balancing (if more than one Ephesoft server in use) •  high availability (if more than one Ephesft server in use) •  image enhancement •  auto rotation based on text alignment •  blank page deletion •  professional level OCR •  Browser-based scanning
  • 23.
    Mountain West FinancialServices Inc. •  225 financial documents ‚trained‘ •  95% classification accuracy •  Reduced staff by half •  System implemented in weeks •  4 million pages a year and growing •  ROI in less than 6 months •  Susan Hartsock, IT Supervisor, says „We‘re saving time, labor, and money. The solution is very intuitive and our staff loves it because it is fast and reliable. Ephesoft has been a sea change for us.“ •  A sample of joint Alfresco and Ephesoft customers include BSA, Trax, Heifer, Gilt Groupe, City and County of Denver, Colorado, etc.)  
  • 24.