PDF/A: A Preservation Format


Published on

The slides for Geof Huth's presentation on PDF/A for Session 4. File Formats: More than alphabet soup? at the Mid-Atlantic Regional Archives Conference's fall meeting in Bethlehem, Pennsylvania, on 21 October 2011. (But remember what I always say: The slides aren't the presentation.)

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • JPEG2000 compression was introduced after release of PDF/A-1 standard Transparency not defined well enough by time of PDF/A-1 standard Transparency found in dropped shadows, cross fades, and highlighting Layers allows layers in maps and engineering drawings to be hidden to help viewer see the data better
  • PDF/A: A Preservation Format

    1. 1. PDF/A A Preservation Format Mid-Atlantic Regional Archives Conference 21 October 2011 Geof Huth [email_address]
    2. 2. File Format Confusion <ul><li>From 5,000 to 15,000 extant file formats </li></ul><ul><li>Most are proprietary </li></ul><ul><li>The numbers add complexity to preservation </li></ul><ul><li>Real preservation formats are few in number </li></ul><ul><li>And we can really count on none of them </li></ul>
    3. 3. Two General Classes of Formats <ul><li>Proprietary </li></ul><ul><ul><li>Controlled by one company </li></ul></ul><ul><ul><li>Underlying code is a trade secret </li></ul></ul><ul><ul><li>If the company goes under, the file format becomes obsolete </li></ul></ul><ul><li>Open </li></ul><ul><ul><li>Controlled by a standards body, a consortium, wiki-like bodies </li></ul></ul><ul><ul><li>Code is free and open to all </li></ul></ul><ul><ul><li>In absence of an “owner,” can still use the code to make a reader </li></ul></ul><ul><li>Neither Guarantees Preservation </li></ul><ul><ul><li>But open formats give you an opening to preservation </li></ul></ul>
    4. 4. Proprietary Formats <ul><li>Tend to be rich in features </li></ul><ul><li>Limited readers for each format </li></ul><ul><li>Limited ability to exchange data </li></ul><ul><li>Difficult for long-term accessibility </li></ul><ul><li>Greater associated costs </li></ul>
    5. 5. Advantages of Open Formats <ul><li>More choice in what application to use </li></ul><ul><li>Better exchange of data </li></ul><ul><li>Better support of long-term preservation </li></ul><ul><li>Possible lower costs </li></ul><ul><li>Ability to create own readers </li></ul>
    6. 6. Format/Software Confusion <ul><li>Software </li></ul><ul><ul><li>Creates a file in the format </li></ul></ul><ul><ul><li>Reads the file for you </li></ul></ul><ul><ul><li>Allows you to interact with the file </li></ul></ul><ul><li>Format </li></ul><ul><ul><li>Is the specific technical form in which a certain file exists </li></ul></ul><ul><ul><li>Can be created by one software product or many </li></ul></ul><ul><li>Examples </li></ul><ul><ul><li>Adobe Acrobat (and many others) vs PDF </li></ul></ul><ul><ul><li>Microsoft Word vs .doc (and .docx, etc.) </li></ul></ul>
    7. 7. Criteria for Preservation Formats (and Files) <ul><li>Ubiquitous </li></ul><ul><li>Long-lived </li></ul><ul><li>Documented </li></ul><ul><li>Metadata-supporting </li></ul><ul><li>Accurate </li></ul><ul><li>Open </li></ul><ul><li>Uncompressed </li></ul><ul><li>Unencrypted </li></ul>
    8. 8. When to Use a Preservation Format <ul><li>Creation </li></ul><ul><ul><li>Begin with a format you know will last </li></ul></ul><ul><ul><li>If so, choose a format that allows modification to a file </li></ul></ul><ul><li>Recordation </li></ul><ul><ul><li>When information becomes a record, save it in a chosen format </li></ul></ul><ul><ul><li>This freezes the file and demonstrates it is a record </li></ul></ul><ul><li>Archiving </li></ul><ul><ul><li>Convert to persistent formats those records needed long-term </li></ul></ul><ul><ul><li>The conversion preserves the records and marks is as permanent </li></ul></ul><ul><li>Early Action Can Save Money and Time </li></ul>
    9. 9. Normalization (action at the point of archiving) <ul><li>Conversion to a format </li></ul><ul><ul><li>Not expected to change </li></ul></ul><ul><ul><li>Not expected to disappear </li></ul></ul><ul><ul><li>Not expected to become unreadable </li></ul></ul><ul><li>Usually conversion to a different format from original </li></ul><ul><li>Generally how preservation formats are used </li></ul><ul><li>Still, may cause data loss or corruption </li></ul>
    10. 10. Options for Preservation of Text <ul><li>American Standard Coding for Information Interchange (ASCII) </li></ul><ul><li>Unicode </li></ul><ul><li>Portable Document Format / Archive (PDF/A) </li></ul><ul><li>Extensible Markup Language (XML) </li></ul><ul><ul><li>Open Document Format (ODF) (ISO/IEC 26300:2006) </li></ul></ul><ul><ul><li>Office Open XML (OOXML) (ISO/IEC 29500:2008) </li></ul></ul>
    11. 11. What is Portable Document Format? <ul><li>Originally developed by Adobe in 1991 </li></ul><ul><li>Specifications made available for free in 2001 </li></ul><ul><li>Format made an open international standard in 2008 </li></ul><ul><li>Includes text and image features </li></ul>
    12. 12. Advantages of PDF <ul><li>Has accessibility across platforms </li></ul><ul><li>Saves look and searchability of original </li></ul><ul><li>Embeds fonts (if desired) </li></ul><ul><li>Allows copying of text from files </li></ul><ul><li>Remains fairly stable and universal </li></ul><ul><li>Is difficult to modify </li></ul><ul><li>Has enhanced document security </li></ul><ul><li>Supports authenticity </li></ul>
    13. 13. Disadvantages of PDF <ul><li>Won’t always perfectly represent original </li></ul><ul><li>Some files are more difficult to convert </li></ul><ul><li>Some formatting may be lost if saved back to original file format </li></ul><ul><li>Limited ability to modify </li></ul><ul><li>A complex format saving image and text </li></ul><ul><li>Tends to be larger than a word processing document </li></ul>
    14. 14. PDF’s Advantage over Others <ul><li>Image and text in one bundle </li></ul><ul><li>Intelligent text </li></ul><ul><li>Accepts importance of format to meaning </li></ul><ul><li>Ubiquity of format and readers </li></ul>
    15. 15. Conversion Practices <ul><li>Have necessary fonts installed </li></ul><ul><li>Ensure lossless compression </li></ul><ul><ul><li>Important for embedded images </li></ul></ul><ul><li>When converting PDF to PDF/A </li></ul><ul><ul><li>Eliminate prohibited features </li></ul></ul><ul><ul><li>Check beforehand or fix during </li></ul></ul>
    16. 16. Flavors of the PDF Standard <ul><li>PDF (vanilla) </li></ul><ul><li>PDF/A (for archival preservation) </li></ul><ul><li>PDF/X (for publishing) </li></ul><ul><li>PDF/E (for engineering drawings) </li></ul><ul><li>PDF/VT (for variable data and transactional printing) </li></ul><ul><li>PDF/UA (for accessibility—in development) </li></ul><ul><li>PDF/H (for healthcare records—a guide, not a standard) </li></ul><ul><li>GeoPDF (for geospatial records—only based on standards) </li></ul>
    17. 17. Portable Document Format / Archive Standards <ul><li>PDF/ A-1 </li></ul><ul><ul><li>ISO Standard 19005-1:2005 </li></ul></ul><ul><ul><li>Based on PDF Reference 1.4 (Acrobat 5) </li></ul></ul><ul><li>PDF/A-2 </li></ul><ul><ul><li>ISO Standard 19005-2:2011 </li></ul></ul><ul><ul><li>Based on PDF Reference 1.7 </li></ul></ul><ul><ul><li>Published 20 June 2011 </li></ul></ul><ul><li>New versions of PDF/A expected </li></ul>
    18. 18. Uses of PDF/A <ul><li>Standard textual documents </li></ul><ul><ul><li>Paper documents </li></ul></ul><ul><ul><li>Word-processing and PDF documents </li></ul></ul><ul><li>Sequences of related digital images </li></ul><ul><li>Documents where appearance matters </li></ul><ul><li>Static documents </li></ul>
    19. 19. Less Appropriate for PDF/A <ul><li>Webpages </li></ul><ul><li>Databases </li></ul><ul><li>Spreadsheets </li></ul><ul><li>Dynamic documents </li></ul>
    20. 20. Creating PDF/As <ul><li>Need a product that can produce one </li></ul><ul><ul><li>Like Adobe Acrobat 8 Professional </li></ul></ul><ul><li>Can convert documents individually </li></ul><ul><ul><li>Opening and converting one at a time </li></ul></ul><ul><li>Can use batch processing </li></ul><ul><ul><li>Converting multiple documents at once </li></ul></ul><ul><ul><li>Supported by Acrobat 8 </li></ul></ul>
    21. 21. General Goals of PDF/A <ul><li>Specifies limited stable set of features </li></ul><ul><ul><li>To ensure long-term validity </li></ul></ul><ul><ul><li>Eliminate features that are not “archival” </li></ul></ul><ul><li>An open preservation standard </li></ul><ul><li>Format designed to be a preservation standard </li></ul>
    22. 22. Required in PDF/A <ul><li>All fonts embedded </li></ul><ul><li>Unlimited legal use of embedded fonts </li></ul><ul><li>Device-independent color </li></ul><ul><li>Metadata describing the file </li></ul><ul><ul><li>File must self-identify the PDF/A version </li></ul></ul>
    23. 23. Excluded from PDF/A-1 <ul><li>Audio and video content </li></ul><ul><li>JavaScript and executable files </li></ul><ul><li>Encryption </li></ul><ul><li>LZW and JPEG 2000 image compression </li></ul><ul><li>Reference to outside content </li></ul><ul><li>Transparency </li></ul><ul><li>Embedded files </li></ul>
    24. 24. Differences in PDF/A-2 <ul><li>Allows embedding of OpenType fonts </li></ul><ul><li>Allows JPEG2000 image compression </li></ul><ul><li>Supports transparent objects </li></ul><ul><li>Supports layers, which can be hidden for viewing </li></ul><ul><li>Defines use of digital signatures </li></ul><ul><ul><li>Defines rules via PDF Advanced Electronic Signatures (PAdES) </li></ul></ul><ul><li>Specifies requirements for custom XMP metadata </li></ul><ul><li>Allows embedded files, but in only one context </li></ul><ul><ul><li>In a PDF/A-2 you can embed PDF/A files </li></ul></ul><ul><ul><li>Allows creation of sets of documents in a single file (e.g. emails) </li></ul></ul><ul><li>All PDF/A-1s are compliant with PDF/A-2 standard </li></ul><ul><ul><li>PDF/A-2 is an extension of PDF/A-1 </li></ul></ul>
    25. 25. PDF/A-1 Conformance Levels <ul><li>PDF/A-1, Level A (full compliance) </li></ul><ul><ul><li>Preserves document’s logical structure </li></ul></ul><ul><ul><li>Preserves text stream in reading order </li></ul></ul><ul><ul><li>Requires language specification </li></ul></ul><ul><ul><li>Requires UNICODE mapping </li></ul></ul><ul><li>PDF/A-1, Level B (minimal compliance) </li></ul><ul><ul><li>Preserves visual appearance </li></ul></ul><ul><ul><li>Doesn’t require as much descriptive info </li></ul></ul><ul><ul><li>Less “accessible” format </li></ul></ul>
    26. 26. Flavors of PDF/A <ul><li>PDF/A-1a (a = accessible) </li></ul><ul><ul><li>RGB Color </li></ul></ul><ul><ul><li>CMYK Color </li></ul></ul><ul><li>PDF/A-1b (b = basic) </li></ul><ul><ul><li>Same color choices </li></ul></ul><ul><li>PDF/A-2a (extension of A-1a) </li></ul><ul><li>PDF/A-2b (extension of A-1b) </li></ul><ul><li>PDF/A-2u (u = Unicode) </li></ul><ul><ul><li>Must use Unicode </li></ul></ul><ul><ul><li>Does not require representation of logical structure </li></ul></ul>
    27. 27. PDF/A Product Lines <ul><li>Adobe Acrobat (www.adobe.com) </li></ul><ul><li>Apago (www.apagoinc.com) </li></ul><ul><li>Callas (www.callassoftware.com) </li></ul><ul><li>Compart (www.compart.net) </li></ul><ul><li>PDFlib (www.pdflib.com) </li></ul><ul><li>PDF Tools AG (www.pdf-tools.com) </li></ul>
    28. 28. PDF/A Validation Tools <ul><li>Adobe Acrobat Preflight Function (www.adobe.com) </li></ul><ul><li>Callas Software pdfaPilot (www.callassoftware.com) </li></ul><ul><li>PDF Tools AG's 3-Heights PDF Validator (www.pdf-tools.com) </li></ul>
    29. 29. Formats are Not Everything <ul><li>Preservation Programs Require Work </li></ul><ul><ul><li>Conversion procedures </li></ul></ul><ul><ul><li>Quality control </li></ul></ul><ul><ul><li>Version control </li></ul></ul><ul><ul><li>Environmental controls </li></ul></ul><ul><ul><li>Metadata creation and maintenance </li></ul></ul><ul><ul><ul><li>Metadata about the records and their information </li></ul></ul></ul><ul><ul><ul><li>Metadata about your preservation actions </li></ul></ul></ul><ul><ul><li>Data management controls (backups, etc.) </li></ul></ul><ul><ul><li>Ensuring that chosen normalized formats are still valid </li></ul></ul><ul><ul><li>Vigilance </li></ul></ul>