10 Tips for Making Your SharePoint Scanning Project A Sucess

Uploaded on

PSIGEN presentation at SharePoint Intelligence Anaheim. During this QuickStart event presentation, we gave an overview of success factors and planning required t

PSIGEN presentation at SharePoint Intelligence Anaheim. During this QuickStart event presentation, we gave an overview of success factors and planning required t

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Where did these tips come from? -10 years of ECM experience across multipl industries-Apply to things other than sharepoint, scanning in general-Seen the good the bad and the ugly in SP-Most issues due to planning shortfalls, no long term vision-Hopefully these can help
  • First let talk about storage, review some critical factors in laying out a plan in this realm-One of the most problematic areas
  • -Images are big when compared to other file types: word, excell, etc-This is a baseline chart, based on 200/300 dpi-Great general planning numbers to figure out storage for back scanning projects -Day forward
  • -Good news is you can control file size through device configurationDPI is one of the most important factors: 200,300,400,600. Anything beyond 300 is a wasteColor is a killer. Most MFPs today come preconfigured for auto-sensing color. You will see on the next slide, massive files.Negligible file size in PDF/TIFF.PDF is just a wrapped G4 Tiff, with PDF overhedEnsure image processing is utilized
  • Quick reference to show you the difference in file sizes across BW, Greyscale and Color, as well as DPI
  • Imaging can create a special issue in the sharepoint world, depending on your volume and types of images you are routing in to the systemYou may need to consider an RBS solution
  • There is a ton of conflicting information out there on Content DB restrictions
  • Here are some considerations for managing you Content DBsOne of the biggest concerns is DR. How do you manage your backups.Can you restore in adequate time.Do you have the backup systems to manage.To manage your sql, large content DBs require expertise on the SQL side100-200GB is recommended
  • Here are some RBS recommendations from MS.If you look at these….They all point to an imaging type content
  • Number 2 is “Use Folder and Filenames”Ton of discussion on whether they are required in SharePOint
  • Folders are dead, right??I don’t think so.Why use them??
  • Users like them.They help build a familiar interface, and can help with adoption of any SharePoint Solution.One big consideration is that they allow the use of 3rd party tools to distribute and present content.Any use of webdav folder sharingDirect integration with office can be affected.
  • Use both folders and filename to help in searchAs a contingency for any future migration, offloading of data.From a DR perspective, if you have filenaming, it makes any DR or recovery effort more managable“Attaches” metadata to the file.
  • We scan to sharepoint for several reasons:To Bridge the Gap from physical to digital- bring paper into digital workflowsTo Archive for later search.Have a search frame of mind to make sure you can find your files easily
  • Capture drives searchUse every method possible to tag your files with data for searching today and future efforts.We discussed foldering and filenaming, adding multiple dimensions will ensure findabilityStart with how you want to find….Best practices here
  • Everyone familiar with OCR? Conversion process of making an image searchable text.Critical to creating the ultimate in search through full text


  • 1. 10 Tips To Make YourSharePoint Scanning Project a Success Stephen Boals 949-916-7700- x230
  • 2. Plan Your Storage
  • 3. How much storage? Description Number of Pages Storage 1 Scanned Page – 8.5 x 11 1 30-50KB 1 Scanned Page – 11x17 1 100KB 1 File Cabinet – 4 drawers 10,0000 500MB 1 Box 2500 125MB 1 Linear Inch 100 5MB1 E Size Engineering Drawing 16 – 8.5x11 800KB (48x36)
  • 4. Key Factors in Storage and Sizing• DPI Setting• Color/Black White/Grayscale• Image Format – PDF or TIFF??• Image Processing technology can reduce file size by 10-30% – Despeckle – Border removal – 3 hole punch removal – Binarization***********
  • 5. File Size Comparison Scanning Mode/DPI File SizeBlack and White – 200 DPI 26KBlack and White - 300 DPI 38KBlack and White - 400 DPI 51KBlack and White - 600 DPI 80K Greyscale – 300 DPI 301K Color- 300 DPI 577K
  • 6. SharePoint Storage Architecture• Image file sizes can lead to DB issues if proper planning does not take place and storage considerations are not examined.• Consider the use of Remote BLOB Storage (RBS)
  • 7. Latest Content Database Limitation• Content databases of up to 4 TB are supported when the following requirements are met: – Disk sub-system performance of 0.25 IOPs per GB. 2 IIOPs per GB is recommended for optimal performance. – You must have developed plans for high availability, disaster recovery, future capacity, and performance testing.• http://technet.microsoft.com/en-us/library/cc298801.aspx• http://sharepoint.microsoft.com/blog/pages/BlogPost.asp x?pID=988
  • 8. Considerations on Content DB Size• Backup and restore.• Skilled administrators.• Complexity of customizations and configurations on SharePoint Server 2010 may necessitate refactoring (or splitting) of data into multiple content databases.• 100-200GB is still the best size for backup and restore, and overall manageability.
  • 9. Microsoft RBS Recommendations• RBS provides benefits in the following: – The content databases are larger than 500 gigabytes (GB). – The BLOB data files are larger than 256 kilobytes (KB). – The BLOB data files are at least 80 KB and the database server is a performance bottleneck. In this case, RBS reduces the both the I/O and processing load on the database server.
  • 10. Use Folder and File Names
  • 11. No More Folders!!• Maybe not• Majority of our implementations, customers required folder naming in libraries• Why?
  • 12. Why Folders?• Users are familiar with Folder structures – Easier adoption• Use of 3rd party tools – Colligo Briefcase – Access Tools• WebDav applications• Office
  • 13. Folders and Filenames• Search Aid• Flexibility for migration• Structured data for DR• Overall Contingencies
  • 14. Think Search
  • 15. Capture drives Search• How do you want to find your documents?• Index fields (Columns in SharePoint) are the critical focus.• Use Term Store and Managed Metadata• Rules to live by: – 5 <= defining fields per document type – Always include dates – Steer clear of field “overdrive”• Automation and data sources can let you go beyond
  • 16. OCR for Search
  • 17. Full Text• The Insurance Policy• Adobe PDF Image + Hidden Text – Industry Standard – One “Package” for image and OCR text – Portable• Provide the ultimate in searchablility with iFilter
  • 18. Define Your Scanning Model
  • 19. Scanning Models
  • 20. Scanning Models• Centralized Capture – Documents are scanned at one location and in “batches” at a particular time or times• De-centralized Capture – Documents are still scanned in batches at a particular time, but are now scanned at multiple locations• Distributed Capture – Documents are scanned at the point of transaction and at multiple locations
  • 21. Trend from Centralized to Distributed Scanning
  • 22. Choose the Correct Scanners
  • 23. Choosing your Weapon• MFPs or Scanners??
  • 24. MFPs – The Pros• Leverage your existing investment in the MFP• Most copier maintenance plans do not charge for scans• MFP manufacturers are really focusing on scanning• Network scanning functions: – Scan to email – Scan to Windows Folders – Scan to FTP• One-to-Many relationship: all workers can use one device.
  • 25. MFPs – The Cons• Contention – “line at the copier”• Poor performance with differing paper sizes• Lack of color dropout (Scanning blue or black backgrounds will result in a black page)• Small Document Feeder sizes (50 – 100 pages)• On average, file sizes are 10-20% larger• Duplex scanning/DPI increase greatly slows down rated speed• Black and White scanning only on some models
  • 26. Scanners – The Pros• Convenience – scan at your desk• Duplexing does not slow down scanner• Color dropout• Superior image quality due to enhancement features• Ease in handling differing paper sizes/types• Larger document feeder selections (up to 1000+ pages)
  • 27. Scanners – The Cons• One to One relationship – directly connected to PC• Additional Maintenance costs• Can be quite expensive to outfit your whole organization.
  • 28. When to use a Dedicated Scanner• Scanning 10+ documents per day• Workers that are constantly scanning throughout the day• Mixed paper sizes, weights and colors• Poor quality, older documents or when image enhancement is required• OCR or ICR applications• High volume copying and printing environments• Large Document scanning• High security environments
  • 29. Key Points When Purchasing• Scanning speed• Document Feeder Capacity• Daily Duty Cycle• Scanning Mode• Warranty and Service
  • 30. Correctly Configure Devices
  • 31. Too Many IT Killers
  • 32. Focus• Almost all MFPs Scan in Color by Default• DPI is always set above 200 DPI• Huge network impact• Huge DB Impact• Huge drain on resources
  • 33. Recommendations-Default• 200 DPI• Black and White• Only add color for specific departmental needs• Use TIFF and PDF (compressed)• Linearized PDF (WebFast)
  • 34. Scan or Capture?
  • 35. Scanning Challenges• Basic capabilities• No standardization• Documents not searchable• Time intensive• Lack of integration into Enterprise Applications
  • 36. Capture vs. Scanning• Capture software can be utilized formeans A scanning application is just a basic to scanning needs, but takes you to a whole new take paper, and quickly and easily convert level from a "capture" perspective. These well it from paper to digital form. They are applications typically have with very of ways to suited to environments a number basic "slice and dice" documents, and really focus on needs, and what I call "onsie-twosie" efficiency, and minimizing the time required to scanning, or low volume environments. scan, index and capture data.
  • 37. Why capture?• Reduce the required time for scanning and indexing documents = Efficiency• Enable a standard process for scanning, capturing, indexing, naming, and processing = Standardization• Provide numerous gateways to multiple repositories = Flexibility
  • 38. Automation is Key
  • 39. Extraction Technologies Advanced Data Extraction (ADE) Zone OCR Manual Entry
  • 40. What is ADE?
  • 41. Automated Routing12332 ATT1232.00
  • 42. Use Barcodes and OMR
  • 43. Routing/Separator Sheets• Utilize barcodes and/or Optical Mark Recognition (OMR)• Capture reads and determines routing based on them
  • 44. Intelligent Routing
  • 45. How are they Created?• Most Capture Apps JDoe Include them• Ad Hoc and Bulk Generation• Excel and Word Macros• Custom SP Apps
  • 46. Summary: Plan, Plan, Plan
  • 47. Keys to Project Success• All items in this presentation are critical to overall planning• Focus on meeting needs and driving users to proper use of technology• Start small – POC• Learn from smaller projects• Expand
  • 48. Who is PSIGEN? • Founded 1995 • Mature capture company • Innovative Capture • Focus on Automation • Integration with 56 ECM systems
  • 49. Who uses PSIGEN products?
  • 50. Links• www.psigen.com• www.scanningwithsharepoint.com