Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Demystifying pd fs

1,361 views

Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Demystifying pd fs

  1. 1. Demystifying PDFs<br />Betsy Fanning<br />AIIM<br /> Nashville 2010<br />
  2. 2. Introduction to PDF<br />Overview of PDF Standards<br />Adoption of PDF Standards<br />Agenda<br />
  3. 3. 951,000,000 PDF pages on Google<br />How Many PDF Files Are There?<br />
  4. 4. Introduction of PDF<br />Portable Document Format<br />Digital format for representing documents<br />PDF Files created<br />Natively<br />Converted from other electronic formats<br />Digitized from paper, microform, or other format<br />A specification for electronic files representing documents specification for electronic files representing documents<br />Digital Documents<br />
  5. 5. Portable Document Format<br />Widely used world wide <br />Business<br />Government<br />Libraries and archives<br />Information must be kept for long periods of time<br />Must remain useable and accessible across multiple generations of technology<br />PDF<br />
  6. 6. Reliable consistent viewing and printing<br />Mix text, raster images, lineart, color<br />Basic unit is the page<br />Easy navigation, fast access to any page<br />Small file size<br />Dynamic<br />Digital signatures<br />Forms<br />What is PDF?<br />
  7. 7. <ul><li>ISO 32000-1:2008, Document management – Portable Document Format – Part 1: PDF 1.7</li></ul>2007 Adobe contacted AIIM to request assistance in taking PDF Specification to ISO<br />Exact replication of PDF Specification 1.7 including changes and amendments<br />ISO 32000-1: 2008 (PDF)<br />
  8. 8. Adds support for geospatial data<br />Supports flash <br />Added collections (portfolios)<br />Allows for bar codes to be used with form fields<br />Added structure elements for MathML<br />Enhanced accessibility<br />Incorporated ETSI TS 102 778 for digital signatures<br />Future – Reader improvements and possible merging of PDF streams<br />ISO/CD 32000-2 (PDF 2.0)<br />
  9. 9. PDF is powerful and flexible<br />May be too flexible for some applications<br />Restrict subset of PDF<br />Need higher degree of reliability <br />May want standard in hands of neutral non-commercial body – Internationally recognized standards body such as ISO<br />Focus on archive needs of government, corporations, libraries<br />Resolve issues with font embedding replacement<br />Why Standardize a Version of PDF<br />
  10. 10. Joint sponsors of the US PDF/A and PDF/E committees<br />AIIM, Association for Information and Image Management<br />Secretariat to ISO/TC 171 and ISO/TC 171/SC2<br />Secretariat to US Technical Advisory Group (TAG) for ISO/TC 171<br />NPES, The Association for Suppliers of Printing, Publishing, and Converting Technologies<br />Secretariat to ANSI Committee for Graphic Arts Technologies Standards (CGATS) <br />Secretariat to US TAG for ISO/TC 130<br />Joint sponsors of PDF Healthcare committee<br />ASTM International<br />Role of AIIM and Partners<br />
  11. 11. ISO Joint Working Groups (JWG) for PDF Standards<br />ISO/TC 171/SC 2, Document management applications – Application issues<br />ISO/TC 130, Graphic technology<br />ISO/TC 46/SC 11, Information and documentation – Archives/records management<br />ISO/TC 42, Photography<br />ISO TC 184/SC4, Automation systems and integration, Industrial data <br />ETSI, European Telecommunications Standards Institute<br />PDF/A Competence Center<br />Role of ISO<br />
  12. 12. The PDF standard<br />Multi-part ISO International Standard<br />ISO 19005-1:2005, Document management – Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1)<br />Part 2 (19005-2) intended to bring PDF/A into conformance with ISO 32000<br />Part 3 (19005-3) Embedded documents<br />And additional future parts, as necessary<br />
  13. 13. PDF/X, ISO 15930 *<br />Pre-press data exchange<br />PDF/A, ISO 19005 (Parts 1, 2 and 3)<br />Archiving electronic documents<br />PDF/E (Engineering), ISO 24517-1<br />For engineering, architectural, and GIS documents<br />PDF/E (Engineering), ISO/NWP 24517-2<br />Archive engineering, architectural, and GIS documents<br />PDF/UA (Universal Access), ISO/CD 14289-1<br />Intended to address Section 508 concerns<br />PDF Healthcare<br />Exchange of electronic health records (CDA and CCR)<br />PDF, ISO 32000-1 (ISO/CD 32000-2)<br />PDF/VT, ISO 16612 (2 parts) *<br />Variable data exchange<br />PRC, Product Representation Compact (ISO/CD 14739-1)<br />* Not AIIM Responsibility<br />PDF Standards<br />
  14. 14. Graphic technology – Prepress digital data exchange – Use of PDF (PDF/X)<br />Specifies the use of PDF for the dissemination of complete digital data, in a single exchange, that contains all elements for final print reproduction. <br />ISO 15930 (PDF/X)<br />
  15. 15. Specifies how to use PDF to define and exchange all content elements and supporting metadata to produce predictable output for variable or transactional document content<br />ISO 16612 (PDF/VT)<br />
  16. 16. “This International Standard specifies how to use the Portable Document Format (PDF) 1.4 for long-term preservation of electronic documents”<br />Applicable to documents containing character, raster, and vector data<br />The standard does not address:<br />Processes for generating PDF/A files<br />Specific implementation details of rendering PDF/A files<br />Methods for storing PDF/A files<br />Hardware and software dependencies<br />ISO 19005-1:2005<br />
  17. 17. Court documents protect citizen’s rights<br />Access is assured in trial courts for 20 to 40 years for the Judiciary<br />Access is often time sensitive<br />On-site courthouse storage not cost effective<br />Court decisions are permanent records held “until the end of the republic” by the National Archives<br />Document format conveys critical information, which must be rendered accurately<br />Cases – New York Southern, Enron, etc.<br />20 years of filings are in PDF<br />Background for PDF/AJudiciary Use Case<br />
  18. 18. Page 18<br />Records Archive<br />Do you have electronic records that need to be retained for:  (check all that apply)<br />Most organizations will be keeping some records for a very long time.<br />N=144, all respondents .<br />
  19. 19. Page 19<br />Archive File Types<br />How are the following content types mostly archived in your organization?<br />N=139, all<br />
  20. 20. NARA defines:<br />“…the ability to access an electronic record throughout its lifecycle, regardless of the technology used when it was originally created”<br />Characteristics of Sustainable Formats<br />Published documentation and open disclosure<br />Widespread adoption and use <br />Self-describing formats<br />External Dependency<br />Impact of Patents<br />Technical Protection Mechanism<br />Sustainable Formats<br />
  21. 21. TIFF<br />Well known<br />Difficult to create digitally born documents<br />Indexing documents may be difficult<br />XML<br />Many schema exist<br />Preserves content not the structure<br />Native File Formats<br />Several file formats <br />May render differently depending on the device or platform used<br />PDF<br />Widely adopted<br />Feature rich<br />Reliable and secure<br />File Formats<br />
  22. 22. PDF/A is intended to address three primary issues:<br />Define a file format that preserves the static visual appearance of electronic documents over time<br />Provide a framework for recording metadata about electronic documents<br />Provide a framework for defining the logical structure and semantic properties of electronic documents<br />PDF/A<br />
  23. 23. Guarantees the secure reproduction of documents<br />No technology requirements<br />Ensures an homogeneous archive<br />Digital born and scanned documents in same archive<br />Valid throughout the world<br />ISO maintained standard<br />Sustainable file format<br />Standards exist, files are self-documenting, adoption <br />Why PDF/A?<br />37% still have separate image and electronic archives<br />
  24. 24. Page 24<br />Records Archive<br />Do you store a significant proportion of your records in any of the following formats?<br />PDF/A making some ground at 30%. <br />Native formats still very prevalent.<br />N=144, all respondents .<br />
  25. 25. Page 25<br />PDF/A<br />What are the main reasons you are not using PDF/A?<br />PDF/A benefits still not understood<br />N=102, Non-PDF/A Users.<br />
  26. 26. “This International Standard specifies how to use the Portable Document Format (PDF) ISO 32000-1 for long-term preservation of electronic documents”<br />Applicable to documents containing character, raster, and vector data<br />The standard does not address:<br />Processes for generating PDF/A files<br />Specific implementation details of rendering PDF/A files<br />Methods for storing PDF/A files<br />Hardware and software dependencies<br />ISO/DIS 19005-2<br />
  27. 27. Additional features in ISO 32000-1 (PDF 1.7)<br />PDF/A-1 based on PDF 1.4<br />JPEG 2000 Image Conversion<br />Added compression process (PDF 1.5)<br />Higher compression rates, better quality<br />Embedding PDF/A within Collection<br />Compile PDF/A collections<br />Transparency<br />Permitted in PDF/A-2<br />Digital Signatures<br />Follow ETSI/PadES Standard<br />PDF Layers (“Optional Content”)<br />Helpful for technical drawings<br />Multilingual content<br />What is in PDF/A-2?<br />
  28. 28. Two Conformance Levels<br />PDF/A-1a and PDF/A-2a<br />Compliance with all requirements of 19005-1<br />Including those regarding structural and semantic tagging<br />PDF/A-1b and PDF/A-2b<br />Compliance with all requirements of 19005-1 minimally necessary to preserve the visual appearance of a PDF/A file<br />PDF/A-2u<br />Compliance with all requirements of 19005-2 except those requirements for logical structure of the document<br />Preserves the visual appearance of the file and ensures any text in the document can be reliably extracted as a series of Unicode code points.<br />PDF/A Conformance<br />
  29. 29. Will not replace or supersede PDF/A-1<br />Few tools will be available initially<br />Look at new features<br />Understand your requirements – then decide<br />PDF/A-1 is and will remain a valid file type<br />Considerations PDF/A-2<br />
  30. 30. Page 30<br />Backfile Conversion to PDF/A<br />How would you characterize your strategy to convert your existing documents to PDF/A?<br />32% driving back-conversion centrally<br />N=40, PDF/A users.<br />
  31. 31. How soon do you plan to converge to PDF/A-2, when it is published?<br />Backfile Conversion to PDF/A-2<br />One third of PDF/A users have not heard of PDF/A-2 <br />Another third will converge to PDF/A-2 in 3 years or less.<br />N=40, PDF/A users.<br />
  32. 32. Would you reject a tool or application that was not tested to a conformance standard ?<br />PDF/A-2 Tools<br />80% expect to use conformance certified creation tools.<br />N=40, PDF/A users.<br />
  33. 33. Document management – Electronic document file format for long-term preservation including embedded files – Part 3: Use of ISO 32000-1 (PDF/A-3) <br />Specifies the use of PDF for preserving the static visual representation of page based electronic documents over time in addition to allowing any type of other content to be included as an embedded file or attachment<br />ISO/NWI/CD 19005-3 (PDF/A-3)<br />
  34. 34. Accessibility<br />Does your content need to be accessible (able to be accessed and read by assistive technologies)?<br />There is a recognition of accessibility regulations.<br />N=144, all respondents .<br />
  35. 35. Document management applications – Electronic document file format enhancement for accessibility (PDF/UA) – Use of ISO 32000-1 (PDF/UA-1)<br />Specifies how to use PDF to produce electronic documents which are accessible<br />Does not specify:<br />Processes for converting paper or electronic documents<br />Storage of PDF/UA documents<br />Specific design, user interface, implementation or other details for rendering<br />ISO/CD 14289-1 (PDF/UA)<br />
  36. 36. Document management – Engineering document format using PDF – Part 1: Use of PDF 1.6 (PDF/E-1)<br />Specifies the use of PDF for the creation of documents used in engineering workflows. It does not define:<br />Method of electronic distribution<br />Method of creation or conversion from paper or electronic documents to the PDF/E format<br />Specific technical design, user interface, or implementation<br />Required hardware or methods for validation<br />ISO 24517-1:2008 (PDF/E)<br />
  37. 37. Addresses need for reliable exchange of engineering documentation<br />Secure distribution of intellectual property<br />Reliable exchange and change management (multiple application types and platforms)<br />Reduces costs associated with paper (distribution as well as storage/archive)<br />Covers 3 primary areas:<br />Compact, accurate printing of engineering drawings<br />Support for exchanging/managing annotation and comment data<br />Incorporation of complex data into PDF (3D, object level data, etc.)<br />Part 2 – Update to ISO 32000-1 and archive capabilities<br />ISO 24517-1: 2008 (PDF/E)<br />
  38. 38. Document management – 3D use of Product Representation Compact (PRC) format – Part 1: PRC 10001<br />Describes a file format for 3D content data for the purposes of 3D visualization and exchange.<br />Used for creating, viewing and distributing 3D data in a document exchange workflow<br />ISO/CD 14739-1 (PRC)<br />
  39. 39. 39<br />What is PDF Healthcare?<br />A “Best Practices Guide” describing attributes of the Portable Document Format (PDF) to facilitate the capture, exchange, preservation and protection of healthcare information<br />Share data easily between healthcare institutions<br />Ease the transition into digital health records for information exchange and sharing<br />Bridge the gap between healthcare providers and consumers<br />
  40. 40. 40<br />PDF Healthcare Background<br />eHealthcare is a reality in today’s environment<br />PDF advantages in healthcare<br />Long-standing success and adoption of PDF<br />PDF provides a secure and universal container for multiple data types regardless of data source or destination<br />PDF is platform- and system-neutral<br />PDF allows for interoperability and bi-directional information exchange<br />Selected records can be easily and quickly printed from PDF when necessary <br />
  41. 41. 41<br />Initial PDF Healthcare Offering<br />Best Practices Guide<br />Describes the attributes of the Portable Document Format (PDF) that are relevant to facilitate the capture, exchange, preservation and protection of healthcare information<br />Implementation Guide / Use Cases<br />Supplemental information that will provide examples of interoperability with existing healthcare standards such as ASTM’s Continuity of Care Record (CCR) <br />
  42. 42. 42<br />Additional PDF Healthcare Offering<br />PDF Healthcare Supporting the Clinical Document Architecture: White Paper<br />Discusses the implementation of PDF Forms in support of the HL7 Clinical Document Architecture (CDA) to simplify, secure, and speed transactions between entities with varying levels of automation<br />Creating PDF Forms for the CDA: Implementation guide<br />Supplemental information that will provide examples of various forms, i.e., Emergency Information Form for Children with Special Needs that support a subset of the CDA schema<br />
  43. 43. Proposed Legislation – PDF/A<br />Alabama<br />Alaska<br />California (Repealed 10/19/2010)<br />Connecticut<br />Florida<br />Idaho<br />Kentucky<br />Missouri<br />Nevada<br />New York<br />Ohio <br />Wisconsin<br />
  44. 44. PDF/A Adoption<br />Europe<br />Standard eBilling (Organisation for Promotion of Automated Accounting)<br />Germany, France, Austria, Switzerland, Poland, Norway<br />Brazil<br />China<br />MoREQ2<br />U.S. Nuclear Regulatory Commission<br />U.S. District Courts<br />NARA<br />Library of Congress<br />
  45. 45. PDF/A-1 compliance is not enough<br />Comply with NARA’s transfer instructions for records in PDF<br />Provide transfer documentation<br />Must comply with image quality specifications for transfer of permanent records<br />Must use OCR processes that do not alter the original bit-mapped image<br />NARA Guidelines<br />
  46. 46. Opportunities<br />Conversion<br />Paper based <br />Electronic files to PDF subsets<br />Validation<br />Isartor Test Suite<br />Bavaria Report (PDFLib)<br />Adobe Acrobat Preflight<br />Data cleanup<br />Metadata<br />Embedding Fonts and images<br />Tagging<br />Consulting and recommending use of PDF/A<br />Conversion of Healthcare records<br />
  47. 47. Betsy Fanning<br />Ph: +1.301.755.2682<br />Skype: betsy.fanning<br />Email: bfanning@aiim.org<br />Twitter: bfanning<br />LinkedIn: www.linkedin.com/in/betsyfanning<br />PDF Standards – www.aiim.org/standards<br />Get involved – Service Companies still needed for AIIM’s National Standards Council (NSC)<br />Questions/Contact<br />
  48. 48. http://www.mach2solutions.net/pdf/pdf.html<br />PDF Demonstration URL<br />

×