An Introduction to Document Scanning, Understanding Your Requirements

2,221 views
1,883 views

Published on

Learn about the basic decisions required for business document scanning. Indexing, file formats, document resolution, color space, and more. Learn about estimating volumes and automated capture technology such as barcode recogonition, OCR, batch document processing and more.

Published in: Technology, Art & Photos
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,221
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
123
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

An Introduction to Document Scanning, Understanding Your Requirements

  1. 1. An Introduction to Document Scanning Business Document Scanning 101: From the Data Capture Prospective
  2. 2. So you have a lot of this?
  3. 3. And you’ve decided this is the answer.
  4. 4. So you need a crash course in scanning
  5. 5. Lessons: Lesson 1: Simplex or Duplex Lesson 2: Resolution Lesson 3: Color Depth Lesson 4: File Formats Lesson 5: Indexing Lesson 6: Document Prep and Estimating Volumes Homework: Learn More About Data Capture and Document Management
  6. 6. Lesson 1: Simplex or Duplex Are the documents single or double-sided? This may seem obvious but…
  7. 7. You many not want documents such as purchase invoices scanned in duplex where the back of the document only contains terms and conditions. On the other hand, if the documents have high legal importance you may want every conceivable item of information captured such as small signatures or notes on the back.
  8. 8. Duplex scanning requires more scanning time/processing and results in larger files.
  9. 9. And you don’t have to be a genius to know that is more costly.
  10. 10. Lesson 2: Resolution
  11. 11. So what is resolution and why does it matter?
  12. 12. What is Resolution? Resolution is expressed as the number of dots per inch (dpi) or less frequently pixels. Pixel refers to “picture element” per inch (ppi) which make up the image or really at what the image was sampled.
  13. 13. Implications of Resolution This graphic contains two images, a “0” as a grayscale image and an “x” as black and white.
  14. 14. Implications of Resolution • If we halved the size of the grid horizontally and vertically (doubled the resolution), the pixels would appear smoother and produce a better quality image, the inverse would be true if we doubled the size of the squares. • If we kept the squares the same size but reduced the size of the characters significantly the resolution is insufficient.
  15. 15. Implications of Resolution So: • The higher the resolution, the better the image quality. • For small characters, increase the resolution to capture them effectively
  16. 16. And, the higher the resolution, the slower the scan and the larger the file.
  17. 17. And, the higher the resolution, the slower the scan and the larger the file. Which means higher scanning and file storage costs, Einstein.
  18. 18. Typical Scanning Resolutions Resolution is generally determined by intended use. • Web graphic – 96 dpi • Standard archive document – 200 dpi • Document required for optical character recognition (OCR) – 300 dpi • Plans/drawings for vectorization – 400 dpi • Documents required for historical archiving – 600 dpi
  19. 19. Lesson 3: Color Depth
  20. 20. Understanding Black and White Documents scanned in black and white are always scanned as grayscale within the scanner. The scanner then applies a process known as thresholding to the image to produce the black and white image. Thresholding simply determines when a pixel should be black or white.
  21. 21. Understanding Grayscale Grayscale is used when the image contains color or grayscale data and the tone of the image needs to be retained, i.e. photographs or shaded graphics.
  22. 22. Understanding Color Color is obviously used when the image contains color data. Some users wish to retain important color information for example, land boundaries or graphical data, and not letterhead logos, highlighters, etc.
  23. 23. File Storage Requirements Bits per pixel 24 8 1
  24. 24. File Storage Requirements Bits per pixel 24 8 1 So the storage requirements for a grayscale image is 8 times larger than a black and white, and color requirements are 24 times more than black and white. And, remember Einstein, larger files equals higher costs.
  25. 25. Lesson 4: File Formats TIFF PDF JPEG For an in-depth look visit: PDF v. TIFF
  26. 26. Understanding TIFF* TIFF • • • • Well established format Most often used for black and white documents Supports multiple pages Interpreted correctly by most applications with a caution on certain color implementations • “Group 4” format refers to the compression method used on black and white images which is a “lossless” compression where original data is not lost in compression/decompression. *Tagged Image File Format
  27. 27. Understanding PDF* PDF • • • • Well established format by Adobe Supports color, grayscale, and black and white Supports multiple pages Generally stored using Group 4 and JPEG compression although supports other formats too. • Used when more advanced features are needed within the file such as embedded Optical Character Recognition (OCR), hyperlinking, digital signing and other security features. *Portable Document Format
  28. 28. Understanding PDF Variations PDF Searchable PDF: Many scanning applications can create searchable PDF files. Here, the scanner applies OCR technology to make the file text searchable. Your application may label this as “make searchable”, “apply OCR”, “text-under-image” or “searchable PDF.” If selected, your file will be text searchable or text selectable within the Acrobat viewer and many other programs that search PDF files
  29. 29. Understanding PDF Variations PDF PDF/A: PDF/A is an ISO-standard for digital preservation or archiving of electronic documents. It differs from standard PDF by omitting features not necessary for long-term archiving, such as font linking. Growing in international government and industry segments, including legal systems, libraries, newspapers, and regulated industries.
  30. 30. Understanding JPEG JPEG • • • • Well established format Most often used for photographs and graphics Supports single page only A “lossy” compression format, that is, some of the data is lost during compression. however it provides good compression ratios for grayscale and color images. *Joint Photographic Expert Group
  31. 31. Compression and File Size JPEG OMG, right? *Comparison courtesy of Wikipedia
  32. 32. Compression and File Size The bottom line: experiment with your images and file size. A middle qualit y scan may meet your needs and save tremendous file space. OMG, right? *Comparison courtesy of Wikipedia
  33. 33. Lesson 5: Indexing For an in-depth look visit: What is Document Indexing?
  34. 34. What is Indexing? Document indexing (sometimes referred to as metadata) enables a users to quickly and efficiently locate their documents, either through a folder structure, database or electronic document management system.
  35. 35. Avoid a disaster
  36. 36. Avoid a disaster Great care should be taken to design an efficient indexing scheme. If the design is not devised correctly at the outset, trying to rectify it later can be both difficult and costly. Sometimes it makes sense to replicate the current manual method for document location to create a familiar, but faster system.
  37. 37. Don’t worry, there is automation Technologies such as • Barcode recognition • OCR • Batch processing • Data Mining, Text Mining can save time and money by automating indexing and more.
  38. 38. Using Barcodes for Indexing Intelligent data capture software can extract data from barcodes to create and send index information to a document management system. For an in-depth look at barcodex in data capture visit: What Can Barcodes Do For Me?
  39. 39. With OCR, make your image-based file fully text searchable or extract data from a zone for indexing.
  40. 40. Using OCR for Indexing With zonal OCR, document areas are identified for automatic OCR capture. Additionally, drag-anddrop OCR allows an operator to highlight document text which is automatically OCR'd and dropped into index fields.
  41. 41. TIPS for OCR • Scan at 300 dpi for greater accuracy and ensure that small text is captured. • Limit the use of color on documents. • Pre-process the image with image enhancement software (available in many data capture products, learn more).
  42. 42. What is Batch Processing? Intelligent data capture solutions often use batch processing that lets you process a whole folder of documents at a time. Some products can “watch folders,” and process files as they are scanned into the folder. For an in-depth look visit: What is Batch Document Processing?
  43. 43. What is Batch Processing? Intelligent data capture solutions often use batch processing that lets you process a whole folder of documents at a time. Some products can “watch folders,” and process files as they are scanned into the folder. Processing can include indexing, file routing, file splitting, and cleaning/enhancing the scans. Learn more.
  44. 44. Lesson 6: Document Prep and Estimating Volumes
  45. 45. Preparation, quality control and indexing are the most time consuming elements of any scanning job and usually the most costly.
  46. 46. TIPS for OCR Typically a good operator can prepare 750-1000 documents per hour, however a number of factors may drop throughput to 300 or 500.
  47. 47. Factors that Influence Document Prep Odd Size Document Type sales receipts, photos, plans/drawings, Bindings three ring, spiral, glue, folder Fasteners staples, paper clips binder clips, rubber bands Attachments Post-its, tabs
  48. 48. Estimating Volumes and Storage Paper Folders Ring Binder Lever arch folder Transfer Cases Bankers Boxes Archive Boxes Filing Cabinets Simplex (avg #s) 30 to 100 200 500 500 500 2500 3000/drawer Duplex (avg #s) 60 to 200 400 1000 1000 1000 5000 6000/drawer Type Learn more about estimating volumes
  49. 49. Homework: Learn More About Data Capture and Document Management More
  50. 50. Document Management Determine if you require a full document management system or do you just need a simple search and retrieval system? Can I use it as a stepping stone while I evaluate my document management system?
  51. 51. Learn More More
  52. 52. Call us for information on: How to digitize medical or dental records. The best way to scan medical or dental records. Scanning paper records. Document scanning for medical or dental records. Going paperless at the medical or dental office. How to capture medical or dental records efficiently. Scanning medical or dental records with Fujitsu ScanSnap. Touchscreen scanning of medical or dental records. How to improve your medical or dental workflow with document scanning. Scanning to EMR or scanning to EDR How to maximize your Fujitsu ScanSnap Using your ScanSnap for a basic document management system Using barcodes and the Fujitsu ScanSnap Scanning with the Fujitsu ScanSnap Automating workflow with the Fujitsu ScanSnap Automating document management capture Scanning into Dentrix Indexing into Dentrix Understanding basic Document Scanning Things your teacher never told you about Document Scanning An introduction to Document Scanning Scanning Fundamentals for the average Joe By DocuFi Makers of ImageRamp Data Capture Solutions 30 years’ Experience in the Document Imaging Market Proven Fujitsu ISV Partner Find out more at ImageRamp and www.docufi.com
  53. 53. Image Credits All images are owned or licensed by DocuFi with acknowledgement given to: Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf Doug Waldron, “Files (85)”, http://bit.ly/1bfciII UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P Knile Lucy, you have some sorting to do! http://bit.ly/19bSgjF Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv Tax Credits, “ Coins”, http://bit.ly/1mtQj5j j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx Marcin Wichar y, Alphabetical, http://bit.ly/1aILOku David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF William Warby w warby,” Gears”, http://bit.ly/1dwtU1S Alan Cleaver,” watching”, http://bit.ly/1h1k9k7 Zoetnet, “overflowing,” http://bit.ly/KHW9Em Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE Seattle Municipal Archives , “Cit y Light worker with office machine, 1954”, http://bit.ly/1eBw3NM • Patrick Hoesly, “Thank you” http://bit.ly/17xKErE • • • • • • • • • • • • • • • •

×