Star 2013-pdfa-pdfa


Published on

A talk on PDF/a with efforts to compare to efforts underway in multimedia archiving. Many Thanks to Leonard Rosenthal (@pdfsage) for the materials used to build this.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Star 2013-pdfa-pdfa

  1. 1. Lessons from document archiving – PDF/A Dave McAllister, Director, Open Source and Standards© 2012 Adobe Systems Incorporated. All Rights Reserved.
  2. 2. Archiving Requirements: Live §  Repeat the current experience at some future time. §  Including active nature and that depends upon being able to provide a suitable “execution environment” in the future. §  Interactive and dynamic types §  More powerful computers and displays available §  New delivery mechanisms and devices §  invent new metaphors that move away from our more static paper-based ideas §  Digital Rights Management §  two-fold challenge §  obsolescence of new (emerging) types §  base technologies may become obsolete© 2012 Adobe Systems Incorporated. All Rights Reserved. 2
  3. 3. Digital Documents have existed for some time §  PDF (1993) A comprehensive format for representing documents and forms §  High fidelity, high precision text layout with embeddable fonts §  High-end device independent, color managed graphics features §  Platform independent definition §  Interactive elements & content §  Multimedia & 3D §  Security & digital signatures That’s great for the web, screen viewing, eBooks, etc. But what about people who just want reliable printing and/or archiving?© 2012 Adobe Systems Incorporated. All Rights Reserved. 3
  4. 4. Digital Document Archive needs §  A document format that §  Conveys critical information §  Can be rendered accurately (predictable and consistent) §  Offers metadata support §  Standard schemas §  Custom Schemas §  Provenance, version, history, audit §  Can incorporate marginalia §  Notes, comments, mark ups §  Can be “signed” (tamper proof) §  Provides a definition of retrieval© 2012 Adobe Systems Incorporated. All Rights Reserved. 4
  5. 5. Enter PDF/A §  PDF/A-1 (ISO 19005) §  Long term preservation of black and white and color compound documents as electronic data §  Combinations of character, raster, vector and other data §  Provisions for capturing semantic information §  Preservation and retrieval of appropriate metadata §  “static paper” ++ §  Annotations & Marginalia §  Metadata §  Signatures© 2012 Adobe Systems Incorporated. All Rights Reserved. 5
  6. 6. PDF/A-1 Details §  More restrictive “coding” of PDF §  No Security/Encryption details §  All data must be self-contained §  Ensures less ambiguity when implementing §  No external resources §  Based on PDF 1.4 & PDF/X-3 §  Fonts MUST be embedded(!!) §  All PDF/X-3 documents can potentially §  Limited annotation support be minimally conforming PDF/A documents without any changes §  No movies and sounds §  Reduces ambiguity between different §  No JavaScript vendors implementations §  Links are stored but not executed §  Removal of any complex or §  Metadata based on Adobe XMP potentially confusing graphic concepts §  Low level font requirements §  No transparency §  Matching font widths §  Limited colorspaces §  CharSet/CIDSet §  CMaps© 2012 Adobe Systems Incorporated. All Rights Reserved. 6
  7. 7. Levels of Conformance §  Minimal Conformance (PDF/A-1B) §  Meet the standard/basic requirements §  Full Conformance (PDF/A-1A) §  Tagged PDF §  Improved searchability via Unicode mappings §  Comprehensive metadata recommendations §  Font data §  Document “pedigree” §  Audit trail© 2012 Adobe Systems Incorporated. All Rights Reserved. 7
  8. 8. Not just File Format – Viewer Requirements §  Color management §  Use of output intent §  No use of alternates (except for Spot & DeviceN) §  Specific handling of DeviceGray §  Font handling §  ALWAYS use embedded data §  Interactivity §  Annotations & form fields are non-interactive §  must use the stored appearance §  provide access to data/contents §  Hyperlinks are “questionable”© 2012 Adobe Systems Incorporated. All Rights Reserved. 8
  9. 9. PDF-A/2 §  Remain focused on “static paper” metaphor §  No interactivity, 3D, multimedia, etc. §  Updated to reference ISO 32000-1 §  Ensure as close to 100% forward compatibility as possible §  A PDF/A-1 document SHOULD be also a valid PDF/A-2 document §  However, valid technical changes to ensure long term reliability were preferred over compatibility. §  Predominantly in the areas of fonts & metadata §  Continue to maintain compatibility with other ISO standards §  PDF/X-4 §  PDF/E-1© 2012 Adobe Systems Incorporated. All Rights Reserved. 9
  10. 10. Some new and important features in A-2 (and A-3) §  Improved compression technology/Smaller files §  JPEG2000 §  Compressed XRefs & Streams (aka “Full Compression”) §  Transparency §  PDF Layers (aka Optional Content) §  Whatever you view on screen, must print! §  PDF Packages/Collections §  May only contain other PDF/A documents This is the only §  Digital signature enhancements major change to PDF/A-3 §  Certified Documents Embedded files can be of any format §  Improved revocation checking §  PAdES (ETSI TS 102778) Compliance required §  Improved tagging/accessibility© 2012 Adobe Systems Incorporated. All Rights Reserved. 10
  11. 11. New work under consideration §  Archiving PDFs with embedded 3D §  Archiving of digitally signed PDFs §  Archiving of source material inside the PDF §  Secondary issues §  Archiving of “documents of record” §  Rich forms, such as those based on XFA or with embedded JavaScript §  Desire to archive the business logic with the values §  May or may not be digitally signed §  Archiving of video and audio embedded into PDF §  New features of ISO 32000-2 §  Portfolios, RichMedia, GIS, etc.© 2012 Adobe Systems Incorporated. All Rights Reserved. 11
  12. 12. Summation §  The basic requirements for multimedia archiving mirror those for digital documents §  The scope of formats is wider §  The envelope of contents is substantially larger §  Documents themselves may be considered a form of media §  Arising complexity will exist as the envelope for documents encompasses media types §  No media type is entirely separate from any other© 2012 Adobe Systems Incorporated. All Rights Reserved. 12
  13. 13. © 2012 Adobe Systems Incorporated. All Rights Reserved.