Are you considering digitizing your paper-based assets? If yes, check this presentation which discusses PDF/A. It also talks about the challenges of digitizing and preserving paper-based documents.
2. Adlib – Who We Are
• Software company – Burlington, Ontario, Canada
• Leading expert in document-to-PDF transformation
• Improve document intensive business processes
• 10+ years experience
• 5,000+ Customers Worldwide
• 50+ Countries
• 100+ Partners
3. Bringing Value To Many Industries
Financial Life Health
Legal Mfg Gov’t Other
Services Sciences Care
5. Agenda
Physical and Digital Media
• Physical and Digital Archiving – Advantages and Disadvantages
• Overview of PDF and PDF/A
Approaches to Digitization
• Maximize the retention of knowledge
• Consider Security Implications
Management of Digitized artifacts
• Revisit Retention and Disposition polices
• Maximize the value to Canada
Opportunities for Savings
• Time & Cost
• Increased Flexibility to mitigate future costs
Summary
7. Physical Archiving - Advantages
Assuming time has not deteriorated the
media…
• The physical archive is the original so
there is no variance from the original
• Relatively little / no question about
authenticity, accuracy
• Technology – All you need are eyes.
8. Physical Archiving - Advantages
• Sentimental Value
Certain original documents will be desired to be
maintained for as long as possible...
9. Physical Archiving - Disadvantages
Time
Time heals all wounds…and destroys all
documents
10. Physical Archiving - Disadvantages
Cost
Elaborate and costly physical storage and
preservation
11. Physical Archiving - Disadvantages
Availability
There’s only one. Options to make it available to
citizens are limited and require manual effort
12. Physical Archiving - Disadvantages
Effort to retrieve
Locating relevant documents relies on
appropriate and accurate taxonomy during on-
boarding
13. Physical Archiving - Disadvantages
Environmental
Impact
Preserving documents require chemicals, such as
3M’s Novec 7100 Engineering Fluid
15. Digital Archiving - Advantages
Space = 56,140,800
128GB USB Flash Drive
$80 CDN
Pages
An entire warehouse of text can fit in a USB
Thumb Drive
16. Digital Archiving - Advantages
Availability
Digital copies can be shared with an unlimited
number of people with little or no effort.
17. Digital Archiving - Advantages
Effort
Technologies such as Full-Text-Searching make
finding relevant documents easier and less
dependent on taxonomy
18. Digital Archiving - Advantages
Automation
Automatically execute Retention and Disposition
policies, Audit and more without manual
intervention
19. Digital Archiving - Advantages
Flexibility
Flexibility to support changing Policy and
Requirements easily
20. Digital Archiving - Advantages
Cost
Digital archives typically cost 90% less to operate
and maintain
21. Digital Archiving - Challenges
• Digital Dark Age
• Wide variety of formats that require special
technology to view
• Reduced Sentimental Value
22. Digital Archiving - Considerations
Digital Dark Age
Ensuring files are accessible tomorrow…
23. Digital Archiving - TIFF
Tagged Image File Format
• Used by FAX Machines (CCITT Group 4)
• Very common image format
• Supports multiple pages
• Significant increase in file size for digitally-born
content
• No Search capability
• Not designed for Long-Term Archiving
24. Digital Archiving – PDF/A
Portable Document Format
(For Archive)
• Adopted by ISO for long-term preservation of
documents
• Based on PDF – the most popular document
format on the Web today
• Highest-Quality representation of document
• Smallest possible file sizes
• Guaranteed to look the same forever
• Universally Viewable – Hardware / Software
independent
25. What is PDF?
Portable Document Format
Originally created by Adobe in the 1990’s, became an
open, ISO Standard (32000:1) in 2008
The most popular file format on the web today (FileInfo.com)
26. What else is PDF Used For?
• Contracts
• Agreements
• Sales Proposals
• Product Literature
• Publications
• Reports
• Standard Operating Procedures
• Long-Term Archiving
• Sharing documents and content with others
• So Much More
28. Isn’t PDF Free?
A quick Google search shows dozens of free
applications for creating PDF…
29. Isn’t PDF Free? - Can your doc change?
Original Excel Chart Free PDF Rendition
Fidelity or quality of conversion is often the cost
30. Isn’t PDF Free? - Can your doc change?
Original Word Doc Free PDF Rendition
Content Re-Flow - Font Substitution - Complex Formats
31. Isn’t PDF Free? - Can you comply?
PDF Features that are often required for compliance or
even optimal document conversion are often missing in free
or low-cost solutions
32. Isn’t PDF Free? - Can you merge?
It can be difficult and time-consuming to merge the content
from multiple applications into a single document.
Few if any free PDF solutions enable this.
33. Isn’t PDF Free? - Can you keep up?
Workers spend far too much time dealing with low quality
and manual PDF rendering technologies
34. Isn’t PDF Free?
• Free or Low-Cost software can cost you:
• Hours of lost productivity
• Lost opportunities
• Miss-communication
• Business delays
• Fines
36. PDF/A - What is it?
• A more strict subset of the PDF specification
• Specifically designed for the purpose of long-term
preservation of documents
• Audio, Video, JavaScript and Executables, Encryption,
External references are all restricted
• Designed to be 100% Self Contained – All fonts must
be embedded
37. PDF/A - What is it?
Based on PDF, Initially Released in 2005
(3 years ahead of PDF as an ISO Standard!)
• PDF/A-1 (a/b)
• Based on PDF 1.4 Specification
• PDF/A-2 (a/b/u)
• Based on ISO 32000-1
• JPEG2000, Transparency, Layers, OpenType Fonts,
PDF/A File Embedded
• PDF/A-3 (a/b/u)
• Arbitrary files can be embedded
38. PDF/A - What is it?
What makes a PDF a PDF/A?
• A Special metadata tag that indicates that the
document presents itself as PDF/A
• Compliance to the PDF/A Standard
39. PDF/A – Disadvantages
File Size – Embedding fonts in each document
means file sizes are larger when compared to
PDF
(This is still significantly better than alternatives such as TIFF)
40. PDF/A – Summary
• PDF provides many benefits over alternatives
such as TIFF
• Small size
• High-quality
• Searchable
• Highly viewable
• Portable
• PDF/A Builds on this and ensures the long-term
viability of content stored in this format
41. The AIIM Document Life Cycle
Optimize with Searchable Content
OCR- Searchable Content
Metadata Retention
- Format PDF
- Enhancements & Watermarks
Support for: - Document Assembly
- PDF/A - Personalization
- TIFF - Security & Approvals
42. PDF/A at Library and Archives Canada
Services Services Services
Upload Module Staging DAM Web Store
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Metadata Warehouse
& Social
Models
43. PDF/A at Library and Archives Canada
Adlib Services Services Services
Upload Module Staging DAM Web Store
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Metadata Warehouse
& Social
Models
44. PDF/A at Library and Archives Canada
Adlib
Services Services Services
Upload Module Staging DAM Web Store
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Metadata Warehouse
& Social
Models
45. PDF/A at Library and Archives Canada
Services Services Services
Upload Module Staging DAM Web Store
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Adlib Metadata Warehouse
& Social
Models
46. PDF/A at Library and Archives Canada
Adlib
Services Services Services
Upload Module Staging DAM Web Store
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Metadata Warehouse
& Social
Models
47. PDF/A at Library and Archives Canada
Adlib
Services Services Services
Upload Module Staging DAM Web Store
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Metadata Warehouse
& Social
Models
48. PDF/A at Library and Archives Canada
Services Services Services
Upload Module Staging DAM Web Store Adlib
Repositories Local/cloud
FTP Module DAM Web
Storage
eMail Module
Structured Templates
Data
Scan Module
Structured
Data
Data
Metadata Warehouse
& Social
Models
51. Digitization – Processing Large Volumes
• Digitizing entire libraries of content can be more
than daunting but help is available:
Seek out industry experts to ensure a
successful transition of knowledge
52. Digitization – Processing Large Volumes
• Digitizing entire libraries of content can be more
than daunting but help is available:
• In-Sourcing and Out-Sourcing : Build a plan
of action that considers Security
requirements
• Is the content potentially sensitive?
• Is there risk of loss?
• Is there a risk of contamination / degradation of
the original content?
53. Digitization – Processing Large Volumes
• Digitizing entire libraries of content can be more
than daunting but help is available:
• Hardware & Software Investments
• What do you need Today & Tomorrow
• Consider Lease for Short Term
requirements
• Provision for the future
54. Digitization – Processing Large Volumes
• Measure Twice, Cut Once
• Plan ahead and consider the future use of the
content when defining requirements
• Understand the entire lifecycle of the content when
architecting the process
• How long will we keep it?
• How will we share it?
• How will people find it?
• How will we dispose of it?
• Will we maintain the originals after digitization?
• What are the specific requirements for each step in the
process?
55. Digitization – Processing Large Volumes
• Start with Quality
• Pay special attention to the digitization process
• Higher quality at the IMAGING stage pays off
• Files can be reduced as necessary later, you can
never ADD quality
• Consider pre-processing when scanning documents
of questionable quality
• Ensure highly accurate OCR is applied prior to on-
boarding into the system, or as a part of the
onboarding process
56. Digitization – Maintaining Taxonomy
Classification and indexes need to be maintained,
but how?
• Purely Physical
• Index Cards, Catalogs, Within Content
• Modernized Physical
• Library systems & databases
57. Digitization – Maintaining Taxonomy
This is often achieved by making the classification
data available on a cover sheet in front of each
document.
This can be extracted from the Library System /
DB, or pulled directly from an Index Card and
even processed from a Catalog (Even if it’s
physical!)
58. Digitization – Approaches
There are 2 methods to digitizing a collection:
1. Batch
• Everything is performed in one or multiple batches
and the sequence of batching is pre-determined
2. Scan-On-Demand
• More opportunistic, existing Archives are digitized
as requested
59. Digitization - Security
Preventing Loss
• Chain of custody
• Limited transportation choices
• Escorted Content
Selective Outsourcing
• Assess the risk
• Employ multiple tiers for Outsourcing
• In-Source for the most critical artifacts
60. Management of Digitized Artifacts
• Revisit Retention and Disposition Policies
• Can we keep digital records longer? Indefinitely?
• Maximizing the value to Canada
• Making content available to Canadians
• Using Search to maximize value and enhance
classification paradigms in use today
62. Cost Savings
• Physical Storage
• Management and Execution of Retention and
Disposition Policies
• Flexibility to support changing Policy and
Requirements easily
65. Summary
• Hire an Expert – Or Become One!
• Do it once and do it right
• Digitize Everything
• On Demand / Disposition
• Physically preserve only sentimental and
historic originals
66. The AIIM Document Life Cycle
Optimize with Searchable Content
OCR- Searchable Content
Metadata Retention
- Format PDF
- Enhancements & Watermarks
Support for: - Document Assembly
- PDF/A - Personalization
- TIFF - Security & Approvals
67. Adlib PDF Enterprise
Input: Output:
• MS Office Process: • PDF
• MS InfoPath • Conversion • PDF/A
• MS Project • Recognition (OCR) • XPS
• Various CAD • Publication • XML
• Various PDF • Merge • TIFF/JPG/BMP/PNG
• Images • TOC • TXT
• OpenOffice • Bookmarks • HTML
• HTML • Headers/Footers
• Over 400 File Types • Digital Signatures
68. Adlib PDF Architecture
Content
Stores
Connector SharePoint Folder Generic
Management Console UI
Connector Framework (Java 1.6/.NET)
WCF / SOAP Services Interface
Manager
s
System System System
Manager Database Manager
Engine
s Transformation Transformation
Engine Engine
69. Adlib Software…
…The PDF Experts!
Your partner for Quality,
Automated Document
Transformation