PRESENTATION: Challenges of Digitization (November 2012)

674 views

Published on

Are you considering digitizing your paper-based assets? If yes, check this presentation which discusses PDF/A. It also talks about the challenges of digitizing and preserving paper-based documents.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
674
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

PRESENTATION: Challenges of Digitization (November 2012)

  1. 1. PDF/AAddressing the challenges of digitizing and preservingpaper-based documents in GoCJeff BrandOctober 26, 2012© ADLIB 2012. THIS SLIDE PRESENTATION CONTAINS PROPRIETARY AND/OR CONFIDENTIAL INFORMATION.
  2. 2. Adlib – Who We Are• Software company – Burlington, Ontario, Canada• Leading expert in document-to-PDF transformation• Improve document intensive business processes• 10+ years experience• 5,000+ Customers Worldwide• 50+ Countries• 100+ Partners
  3. 3. Bringing Value To Many IndustriesFinancial Life Health Legal Mfg Gov’t OtherServices Sciences Care
  4. 4. Key Partners
  5. 5. Agenda Physical and Digital Media • Physical and Digital Archiving – Advantages and Disadvantages • Overview of PDF and PDF/A Approaches to Digitization • Maximize the retention of knowledge • Consider Security Implications Management of Digitized artifacts • Revisit Retention and Disposition polices • Maximize the value to Canada Opportunities for Savings • Time & Cost • Increased Flexibility to mitigate future costs Summary
  6. 6. Physical Archiving• Preserving and Storing the original or exemplary specimen in original, physical form
  7. 7. Physical Archiving - Advantages Assuming time has not deteriorated the media… • The physical archive is the original so there is no variance from the original • Relatively little / no question about authenticity, accuracy • Technology – All you need are eyes.
  8. 8. Physical Archiving - Advantages• Sentimental ValueCertain original documents will be desired to bemaintained for as long as possible...
  9. 9. Physical Archiving - DisadvantagesTimeTime heals all wounds…and destroys alldocuments
  10. 10. Physical Archiving - DisadvantagesCostElaborate and costly physical storage andpreservation
  11. 11. Physical Archiving - DisadvantagesAvailabilityThere’s only one. Options to make it available tocitizens are limited and require manual effort
  12. 12. Physical Archiving - DisadvantagesEffort to retrieveLocating relevant documents relies onappropriate and accurate taxonomy during on-boarding
  13. 13. Physical Archiving - DisadvantagesEnvironmentalImpactPreserving documents require chemicals, such as3M’s Novec 7100 Engineering Fluid
  14. 14. Digital Archiving• Preserving and Storing the original or exemplary specimen in a digitized form.
  15. 15. Digital Archiving - AdvantagesSpace = 56,140,800 128GB USB Flash Drive $80 CDN PagesAn entire warehouse of text can fit in a USBThumb Drive
  16. 16. Digital Archiving - AdvantagesAvailabilityDigital copies can be shared with an unlimitednumber of people with little or no effort.
  17. 17. Digital Archiving - AdvantagesEffortTechnologies such as Full-Text-Searching makefinding relevant documents easier and lessdependent on taxonomy
  18. 18. Digital Archiving - AdvantagesAutomationAutomatically execute Retention and Dispositionpolicies, Audit and more without manualintervention
  19. 19. Digital Archiving - AdvantagesFlexibilityFlexibility to support changing Policy andRequirements easily
  20. 20. Digital Archiving - AdvantagesCostDigital archives typically cost 90% less to operateand maintain
  21. 21. Digital Archiving - Challenges• Digital Dark Age• Wide variety of formats that require special technology to view• Reduced Sentimental Value
  22. 22. Digital Archiving - ConsiderationsDigital Dark AgeEnsuring files are accessible tomorrow…
  23. 23. Digital Archiving - TIFFTagged Image File Format• Used by FAX Machines (CCITT Group 4)• Very common image format• Supports multiple pages• Significant increase in file size for digitally-born content• No Search capability• Not designed for Long-Term Archiving
  24. 24. Digital Archiving – PDF/APortable Document Format(For Archive)• Adopted by ISO for long-term preservation of documents• Based on PDF – the most popular document format on the Web today• Highest-Quality representation of document• Smallest possible file sizes• Guaranteed to look the same forever• Universally Viewable – Hardware / Software independent
  25. 25. What is PDF? Portable Document Format Originally created by Adobe in the 1990’s, became an open, ISO Standard (32000:1) in 2008 The most popular file format on the web today (FileInfo.com)
  26. 26. What else is PDF Used For?• Contracts• Agreements• Sales Proposals• Product Literature• Publications• Reports• Standard Operating Procedures• Long-Term Archiving• Sharing documents and content with others• So Much More
  27. 27. Isn’t PDF Free? Many applications can save to PDF
  28. 28. Isn’t PDF Free? A quick Google search shows dozens of free applications for creating PDF…
  29. 29. Isn’t PDF Free? - Can your doc change?Original Excel Chart Free PDF Rendition Fidelity or quality of conversion is often the cost
  30. 30. Isn’t PDF Free? - Can your doc change? Original Word Doc Free PDF RenditionContent Re-Flow - Font Substitution - Complex Formats
  31. 31. Isn’t PDF Free? - Can you comply? PDF Features that are often required for compliance oreven optimal document conversion are often missing in free or low-cost solutions
  32. 32. Isn’t PDF Free? - Can you merge?It can be difficult and time-consuming to merge the content from multiple applications into a single document. Few if any free PDF solutions enable this.
  33. 33. Isn’t PDF Free? - Can you keep up?Workers spend far too much time dealing with low quality and manual PDF rendering technologies
  34. 34. Isn’t PDF Free?• Free or Low-Cost software can cost you: • Hours of lost productivity • Lost opportunities • Miss-communication • Business delays • Fines
  35. 35. PDF/A
  36. 36. PDF/A - What is it?• A more strict subset of the PDF specification• Specifically designed for the purpose of long-term preservation of documents• Audio, Video, JavaScript and Executables, Encryption, External references are all restricted• Designed to be 100% Self Contained – All fonts must be embedded
  37. 37. PDF/A - What is it?Based on PDF, Initially Released in 2005(3 years ahead of PDF as an ISO Standard!)• PDF/A-1 (a/b) • Based on PDF 1.4 Specification• PDF/A-2 (a/b/u) • Based on ISO 32000-1 • JPEG2000, Transparency, Layers, OpenType Fonts, PDF/A File Embedded• PDF/A-3 (a/b/u) • Arbitrary files can be embedded
  38. 38. PDF/A - What is it?What makes a PDF a PDF/A?• A Special metadata tag that indicates that the document presents itself as PDF/A• Compliance to the PDF/A Standard
  39. 39. PDF/A – DisadvantagesFile Size – Embedding fonts in each documentmeans file sizes are larger when compared toPDF(This is still significantly better than alternatives such as TIFF)
  40. 40. PDF/A – Summary• PDF provides many benefits over alternatives such as TIFF • Small size • High-quality • Searchable • Highly viewable • Portable• PDF/A Builds on this and ensures the long-term viability of content stored in this format
  41. 41. The AIIM Document Life Cycle Optimize with Searchable Content OCR- Searchable Content Metadata Retention - Format PDF - Enhancements & Watermarks Support for: - Document Assembly - PDF/A - Personalization - TIFF - Security & Approvals
  42. 42. PDF/A at Library and Archives Canada Services Services ServicesUpload Module Staging DAM Web Store Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Metadata Warehouse & Social Models
  43. 43. PDF/A at Library and Archives Canada Adlib Services Services ServicesUpload Module Staging DAM Web Store Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Metadata Warehouse & Social Models
  44. 44. PDF/A at Library and Archives Canada Adlib Services Services ServicesUpload Module Staging DAM Web Store Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Metadata Warehouse & Social Models
  45. 45. PDF/A at Library and Archives Canada Services Services ServicesUpload Module Staging DAM Web Store Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Adlib Metadata Warehouse & Social Models
  46. 46. PDF/A at Library and Archives Canada Adlib Services Services ServicesUpload Module Staging DAM Web Store Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Metadata Warehouse & Social Models
  47. 47. PDF/A at Library and Archives Canada Adlib Services Services ServicesUpload Module Staging DAM Web Store Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Metadata Warehouse & Social Models
  48. 48. PDF/A at Library and Archives Canada Services Services ServicesUpload Module Staging DAM Web Store Adlib Repositories Local/cloudFTP Module DAM Web StorageeMail Module Structured Templates DataScan Module Structured Data Data Metadata Warehouse & Social Models
  49. 49. Digitization
  50. 50. Preparation – Typical Document Process
  51. 51. Digitization – Processing Large Volumes• Digitizing entire libraries of content can be more than daunting but help is available: Seek out industry experts to ensure a successful transition of knowledge
  52. 52. Digitization – Processing Large Volumes• Digitizing entire libraries of content can be more than daunting but help is available: • In-Sourcing and Out-Sourcing : Build a plan of action that considers Security requirements • Is the content potentially sensitive? • Is there risk of loss? • Is there a risk of contamination / degradation of the original content?
  53. 53. Digitization – Processing Large Volumes• Digitizing entire libraries of content can be more than daunting but help is available: • Hardware & Software Investments • What do you need Today & Tomorrow • Consider Lease for Short Term requirements • Provision for the future
  54. 54. Digitization – Processing Large Volumes• Measure Twice, Cut Once • Plan ahead and consider the future use of the content when defining requirements • Understand the entire lifecycle of the content when architecting the process • How long will we keep it? • How will we share it? • How will people find it? • How will we dispose of it? • Will we maintain the originals after digitization? • What are the specific requirements for each step in the process?
  55. 55. Digitization – Processing Large Volumes• Start with Quality • Pay special attention to the digitization process • Higher quality at the IMAGING stage pays off • Files can be reduced as necessary later, you can never ADD quality • Consider pre-processing when scanning documents of questionable quality • Ensure highly accurate OCR is applied prior to on- boarding into the system, or as a part of the onboarding process
  56. 56. Digitization – Maintaining TaxonomyClassification and indexes need to be maintained,but how?• Purely Physical • Index Cards, Catalogs, Within Content• Modernized Physical • Library systems & databases
  57. 57. Digitization – Maintaining TaxonomyThis is often achieved by making the classificationdata available on a cover sheet in front of eachdocument.This can be extracted from the Library System /DB, or pulled directly from an Index Card andeven processed from a Catalog (Even if it’sphysical!)
  58. 58. Digitization – ApproachesThere are 2 methods to digitizing a collection:1. Batch • Everything is performed in one or multiple batches and the sequence of batching is pre-determined2. Scan-On-Demand • More opportunistic, existing Archives are digitized as requested
  59. 59. Digitization - SecurityPreventing Loss • Chain of custody • Limited transportation choices • Escorted ContentSelective Outsourcing • Assess the risk • Employ multiple tiers for Outsourcing • In-Source for the most critical artifacts
  60. 60. Management of Digitized Artifacts• Revisit Retention and Disposition Policies • Can we keep digital records longer? Indefinitely?• Maximizing the value to Canada • Making content available to Canadians • Using Search to maximize value and enhance classification paradigms in use today
  61. 61. Sharing Canada’s Digitized Artifacts Maximizing the value to Canada: • Education • Legal • Innovation
  62. 62. Cost Savings• Physical Storage• Management and Execution of Retention and Disposition Policies• Flexibility to support changing Policy and Requirements easily
  63. 63. Cost Savings $36,659.20 1990 1995 $819.20 Price per GB 2000 $1,433.60 2005 $10.00$40,000.00 2010 $0.10$35,000.00 2012 $0.05 2020 $0.02$30,000.00$25,000.00 Price per GB$20,000.00$15,000.00$10,000.00 $5,000.00 $- 1990 1995 2000 2005 2010 2012 2020
  64. 64. Cost Savings $36,659.20 1990 1995 $819.20 Price per GB 2000 $1,433.60$0.12 2005 $10.00 2010 $0.10 2012 $0.05$0.10 2020 $0.02$0.08 Price per GB$0.06$0.04$0.02 $- 2010 2012 2020
  65. 65. Summary• Hire an Expert – Or Become One! • Do it once and do it right• Digitize Everything • On Demand / Disposition• Physically preserve only sentimental and historic originals
  66. 66. The AIIM Document Life Cycle Optimize with Searchable Content OCR- Searchable Content Metadata Retention - Format PDF - Enhancements & Watermarks Support for: - Document Assembly - PDF/A - Personalization - TIFF - Security & Approvals
  67. 67. Adlib PDF Enterprise Input: Output:• MS Office Process: • PDF• MS InfoPath • Conversion • PDF/A• MS Project • Recognition (OCR) • XPS• Various CAD • Publication • XML• Various PDF • Merge • TIFF/JPG/BMP/PNG• Images • TOC • TXT• OpenOffice • Bookmarks • HTML• HTML • Headers/Footers• Over 400 File Types • Digital Signatures
  68. 68. Adlib PDF Architecture Content StoresConnector SharePoint Folder Generic Management Console UI Connector Framework (Java 1.6/.NET) WCF / SOAP Services InterfaceManager s System System System Manager Database Manager Engine s Transformation Transformation Engine Engine
  69. 69. Adlib Software… …The PDF Experts! Your partner for Quality, Automated Document Transformation
  70. 70. Contact InformationMatt WoodworthManager, Public Sector. North America.613 218 6778mwoodworth@adlibsoftware.com

×