+PDF/AA Preservation FormatMid-Atlantic Regional Archives Conference21 October 2011                                       ...
+    File Format Confusion       From 5,000 to 15,000 extant file formats       Most are proprietary       The numbers ...
+    Two General Classes of Formats       Proprietary           Controlled by one company           Underlying code is ...
+    Proprietary Formats       Tend to be rich in features       Limited readers for each format       Limited ability ...
+    Advantages of Open Formats       More choice in what application to use       Better exchange of data       Better...
+    Format/Software Confusion       Software           Creates a file in the format           Reads the file for you  ...
+    Criteria for Preservation Formats    (and Files)       Ubiquitous       Long-lived       Documented       Metadat...
+    When to Use a Preservation Format       Creation           Begin with a format you know will last           If so,...
+    Normalization    (action at the point of archiving)       Conversion to a format             Not expected to change...
+    Options for Preservation of Text       American Standard Coding for Information Interchange (ASCII)       Unicode  ...
+    What is Portable Document    Format?        Originally developed by Adobe in 1991        Specifications made availa...
+    Advantages of PDF        Has accessibility across platforms        Saves look and searchability of original       ...
+    Disadvantages of PDF       Won’t always perfectly represent original       Some files are more difficult to convert...
+    PDF’s Advantage over Others       Image and text in one bundle       Intelligent text       Accepts importance of ...
+    Conversion Practices        Have necessary fonts installed        Ensure lossless compression            Important...
+    Flavors of the PDF Standard       PDF (vanilla)       PDF/A (for archival preservation)       PDF/X (for publishin...
+    Portable Document Format /    Archive Standards       PDF/ A-1           ISO Standard 19005-1:2005           Based...
+    Uses of PDF/A        Standard textual documents            Paper documents            Word-processing and PDF docu...
+    Less Appropriate for PDF/A        Webpages        Databases        Spreadsheets        Dynamic documents
+    Creating PDF/As        Need a product that can produce one            Like Adobe Acrobat 8 Professional        Can...
+    General Goals of PDF/A       Specifies limited stable set of features           To ensure long-term validity       ...
+    Required in PDF/A       All fonts embedded       Unlimited legal use of embedded fonts       Device-independent co...
+    Excluded from PDF/A-1       Audio and video content       JavaScript and executable files       Encryption       ...
+    Differences in PDF/A-2       Allows embedding of OpenType fonts       Allows JPEG2000 image compression       Supp...
+    PDF/A-1 Conformance Levels        PDF/A-1, Level A (full compliance)            Preserves document’s logical struct...
+    Flavors of PDF/A       PDF/A-1a (a = accessible)           RGB Color           CMYK Color       PDF/A-1b (b = bas...
+    PDF/A Product Lines        Adobe Acrobat (www.adobe.com)        Apago (www.apagoinc.com)        Callas (www.callas...
+    PDF/A Validation Tools        Adobe Acrobat Preflight Function (www.adobe.com)        Callas Software pdfaPilot (ww...
+    Formats are Not Everything       Preservation Programs Require Work           Conversion procedures           Qual...
Upcoming SlideShare
Loading in …5
×

PDF/A: A Preservation Format

712 views

Published on

A presentation by Geof Huth on the PDF/A preservation format presented at a meeting of the Mid-Atlantic Regional Archives Conference in Bethlehem, Pennsylvania, on October 21, 2011. This presentation puts PDF/A in the context of a digital preservation program and explains the uses of its format and some of the details of this international standard.

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
712
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • JPEG2000 compression was introduced after release of PDF/A-1 standard Transparency not defined well enough by time of PDF/A-1 standard Transparency found in dropped shadows, cross fades, and highlighting Layers allows layers in maps and engineering drawings to be hidden to help viewer see the data better
  • PDF/A: A Preservation Format

    1. 1. +PDF/AA Preservation FormatMid-Atlantic Regional Archives Conference21 October 2011 Geof Huth geofhuth@gmail.com
    2. 2. + File Format Confusion  From 5,000 to 15,000 extant file formats  Most are proprietary  The numbers add complexity to preservation  Real preservation formats are few in number  And we can really count on none of them
    3. 3. + Two General Classes of Formats  Proprietary  Controlled by one company  Underlying code is a trade secret  If the company goes under, the file format becomes obsolete  Open  Controlled by a standards body, a consortium, wiki-like bodies  Code is free and open to all  In absence of an “owner,” can still use the code to make a reader  Neither Guarantees Preservation  But open formats give you an opening to preservation
    4. 4. + Proprietary Formats  Tend to be rich in features  Limited readers for each format  Limited ability to exchange data  Difficult for long-term accessibility  Greater associated costs
    5. 5. + Advantages of Open Formats  More choice in what application to use  Better exchange of data  Better support of long-term preservation  Possible lower costs  Ability to create own readers
    6. 6. + Format/Software Confusion  Software  Creates a file in the format  Reads the file for you  Allows you to interact with the file  Format  Is the specific technical form in which a certain file exists  Can be created by one software product or many  Examples  Adobe Acrobat (and many others) vs PDF  Microsoft Word vs .doc (and .docx, etc.)
    7. 7. + Criteria for Preservation Formats (and Files)  Ubiquitous  Long-lived  Documented  Metadata-supporting  Accurate  Open  Uncompressed  Unencrypted
    8. 8. + When to Use a Preservation Format  Creation  Begin with a format you know will last  If so, choose a format that allows modification to a file  Recordation  When information becomes a record, save it in a chosen format  This freezes the file and demonstrates it is a record  Archiving  Convert to persistent formats those records needed long-term  The conversion preserves the records and marks is as permanent  Early Action Can Save Money and Time
    9. 9. + Normalization (action at the point of archiving)  Conversion to a format  Not expected to change  Not expected to disappear  Not expected to become unreadable  Usually conversion to a different format from original  Generally how preservation formats are used  Still, may cause data loss or corruption
    10. 10. + Options for Preservation of Text  American Standard Coding for Information Interchange (ASCII)  Unicode  Portable Document Format / Archive (PDF/A)  Extensible Markup Language (XML)  Open Document Format (ODF) (ISO/IEC 26300:2006)  Office Open XML (OOXML) (ISO/IEC 29500:2008)
    11. 11. + What is Portable Document Format?  Originally developed by Adobe in 1991  Specifications made available for free in 2001  Format made an open international standard in 2008  Includes text and image features
    12. 12. + Advantages of PDF  Has accessibility across platforms  Saves look and searchability of original  Embeds fonts (if desired)  Allows copying of text from files  Remains fairly stable and universal  Is difficult to modify  Has enhanced document security  Supports authenticity
    13. 13. + Disadvantages of PDF  Won’t always perfectly represent original  Some files are more difficult to convert  Some formatting may be lost if saved back to original file format  Limited ability to modify  A complex format saving image and text  Tends to be larger than a word processing document
    14. 14. + PDF’s Advantage over Others  Image and text in one bundle  Intelligent text  Accepts importance of format to meaning  Ubiquity of format and readers
    15. 15. + Conversion Practices  Have necessary fonts installed  Ensure lossless compression  Important for embedded images  When converting PDF to PDF/A  Eliminate prohibited features  Check beforehand or fix during
    16. 16. + Flavors of the PDF Standard  PDF (vanilla)  PDF/A (for archival preservation)  PDF/X (for publishing)  PDF/E (for engineering drawings)  PDF/VT (for variable data and transactional printing)  PDF/UA (for accessibility—in development)  PDF/H (for healthcare records—a guide, not a standard)  GeoPDF (for geospatial records—only based on standards)
    17. 17. + Portable Document Format / Archive Standards  PDF/ A-1  ISO Standard 19005-1:2005  Based on PDF Reference 1.4 (Acrobat 5)  PDF/A-2  ISO Standard 19005-2:2011  Based on PDF Reference 1.7  Published 20 June 2011  New versions of PDF/A expected
    18. 18. + Uses of PDF/A  Standard textual documents  Paper documents  Word-processing and PDF documents  Sequences of related digital images  Documents where appearance matters  Static documents
    19. 19. + Less Appropriate for PDF/A  Webpages  Databases  Spreadsheets  Dynamic documents
    20. 20. + Creating PDF/As  Need a product that can produce one  Like Adobe Acrobat 8 Professional  Can convert documents individually  Opening and converting one at a time  Can use batch processing  Converting multiple documents at once  Supported by Acrobat 8
    21. 21. + General Goals of PDF/A  Specifies limited stable set of features  To ensure long-term validity  Eliminate features that are not “archival”  An open preservation standard  Format designed to be a preservation standard
    22. 22. + Required in PDF/A  All fonts embedded  Unlimited legal use of embedded fonts  Device-independent color  Metadata describing the file  File must self-identify the PDF/A version
    23. 23. + Excluded from PDF/A-1  Audio and video content  JavaScript and executable files  Encryption  LZW and JPEG 2000 image compression  Reference to outside content  Transparency  Embedded files
    24. 24. + Differences in PDF/A-2  Allows embedding of OpenType fonts  Allows JPEG2000 image compression  Supports transparent objects  Supports layers, which can be hidden for viewing  Defines use of digital signatures  Defines rules via PDF Advanced Electronic Signatures (PAdES)  Specifies requirements for custom XMP metadata  Allows embedded files, but in only one context  In a PDF/A-2 you can embed PDF/A files  Allows creation of sets of documents in a single file (e.g. emails)  All PDF/A-1s are compliant with PDF/A-2 standard  PDF/A-2 is an extension of PDF/A-1
    25. 25. + PDF/A-1 Conformance Levels  PDF/A-1, Level A (full compliance)  Preserves document’s logical structure  Preserves text stream in reading order  Requires language specification  Requires UNICODE mapping  PDF/A-1, Level B (minimal compliance)  Preserves visual appearance  Doesn’t require as much descriptive info  Less “accessible” format
    26. 26. + Flavors of PDF/A  PDF/A-1a (a = accessible)  RGB Color  CMYK Color  PDF/A-1b (b = basic)  Same color choices  PDF/A-2a (extension of A-1a)  PDF/A-2b (extension of A-1b)  PDF/A-2u (u = Unicode)  Must use Unicode  Does not require representation of logical structure
    27. 27. + PDF/A Product Lines  Adobe Acrobat (www.adobe.com)  Apago (www.apagoinc.com)  Callas (www.callassoftware.com)  Compart (www.compart.net)  PDFlib (www.pdflib.com)  PDF Tools AG (www.pdf-tools.com)
    28. 28. + PDF/A Validation Tools  Adobe Acrobat Preflight Function (www.adobe.com)  Callas Software pdfaPilot (www.callassoftware.com)  PDF Tools AGs 3-Heights PDF Validator (www.pdf-tools.com)
    29. 29. + Formats are Not Everything  Preservation Programs Require Work  Conversion procedures  Quality control  Version control  Environmental controls  Metadata creation and maintenance  Metadata about the records and their information  Metadata about your preservation actions  Data management controls (backups, etc.)  Ensuring that chosen normalized formats are still valid  Vigilance

    ×