January 2006 Archival Storage Strategies and Technologies Presentation


  1. 1. Archival Storage Strategies & Technologies AIIM Presentation January 25, 2006 Porter-Roth Associates 1
  2. 2. Bud Porter-Roth Porter-Roth Associates 415-381-6217 Porter-Roth Associates 2
  3. 3. Agenda Introduction The Preservation Problem Recommendations Porter-Roth Associates 3
  4. 4. Introduction Disaster Recovery Basic Need Compliance Porter-Roth Associates 4
  5. 5. Introduction Paper e-Mail Servers Business Files Systems Electronic Photographs Microfilm Document Repositories Flash Drives PDAs Imaging Repositories Local Drives Video Libraries File Systems Web Servers Porter-Roth Associates 5
  6. 6. The Preservation Problem The problem is actually two separate, sort of unrelated issues: Hardware and software to store and read documents Hardware, OS, applications The format that the documents are in Word, PDF, XML Porter-Roth Associates 6
  7. 7. The Preservation Problem The “Problem” in brief Software formats change and become non-supported Software formats fall out of favor over time and disappear Hardware drives change and become non-supported Storage media changes overtime and becomes obsolete Floppy disks Optical disks (WORM, CD, DVD) Tape (many flavors of) Portable storage media like the “Memory Stick” in use today With all of the above issues, for digital documents, it means that there is a strong chance that you will be forced to convert something to something else over time – as a, in the foreseeable future, continuing process. Porter-Roth Associates 7
  8. 8. The Preservation Problem TIFF (Tagged Image File Format) usually with ITG Group 4 compression JPEG (Joint Photographic Experts Group) GIF (Graphic Interchange Format) PNG (Portable Network Graphics) Native file formats (Word, Excel, etc) also known as “Born Digital” documents PDF, PDF/A, PDF/X Many other proprietary electronic formats Paper Film Porter-Roth Associates 8
  9. 9. The Preservation Problem What is the best option for preserving electronic documents over archival time spans? (Disregarding the hardware storage issues) TIFF? A “digital picture” of your page Widely adopted standard for document imaging Not human readable without the software No access to underlying text without OCR XML? A format description of the page – a style sheet Good for describing logical structure, but not appearance Many incompatible domain-specific schemas Native Format (e.g., MS Word)? Several ubiquitous, but closed proprietary formats Can you spell WordPerfect? PDF? PDF/A? Microsoft Metro renamed XPS? Porter-Roth Associates 9
  10. 10. Desirable Properties of a Format Device independence Can be reliably and consistently rendered without regard to the hardware/software platform Self-contained Contains all resources necessary for rendering Self-documenting Contains its own description Transparency Amenable to direct analysis with basic tools Porter-Roth Associates 10
  11. 11. Adobe PDF and PDF/A PDF is a ubiquitous open format for electronic documents Proprietary, but with publicly available specification Companies, other than Adobe, make PDF products Many statutory, regulatory, and institutional policies mandate the retention of PDF-based documents over multiple generations of technology The feature-rich nature of PDF can complicate preservation efforts Porter-Roth Associates 11
  12. 12. PDF/A PDF/A is intended to address three primary issues: Define a file format that preserves the static visual appearance of electronic documents over time Provide a framework for recording metadata about electronic documents Provide a framework for defining the logical structure and semantic properties of electronic documents Porter-Roth Associates 12
  13. 13. PDF/A PDF/A constraints include: Audio and video content are forbidden Javascript and executable file launches are prohibited All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering Colorspaces specified in a device-independent manner Encryption is disallowed Use of standards-based metadata is mandated Porter-Roth Associates 13
  14. 14. PDF/A However… PDF/A alone does not guarantee preservation PDF/A alone does not guarantee exact replication of source material The intent of PDF/A is not to claim that PDF-based solutions are the best way to preserve electronic documents But once you have decided to use a PDF-based approach, PDF/A defines an archival profile of PDF that is more amenable to long-term preservation Porter-Roth Associates 14
  15. 15. PDF/A ….Nevertheless PDF/A may not be the last preservation format you will use or need However, proper application of PDF/A should result in reliable, predictable, and unambiguous access to the full information content of electronic documents Porter-Roth Associates 15
  16. 16. Microsoft XPS XPS is an abbreviation for the XML Paper Specification The XML Paper Specification describes the XPS Document format. A document in XPS Document format (XPS Document) is a paginated representation of electronic paper described in an XML-based format. The XPS Document format is an open, cross-platform document format that allows customers to effortlessly create, share, print, and archive paginated documents. XPS Documents use a file container that conforms to the Open Packaging Conventions. The new file formats in the next version of the Microsoft Office System, codenamed Office quot;12,quot; also use the Open Packaging Conventions for organizing data into files, allowing businesses to be able to manage Office quot;12quot; documents and XPS Documents in the same manner. The XPS Document format is both a fixed-layout document interchange format, a native Windows Vista spool file format, and a PDL (Page Description Language, used by printing devices). Porter-Roth Associates 16
  17. 17. Recommendations This is still a wild frontier, with no certain outcome or single standard “The good thing about standards is that there are so many of them….” When in doubt about long-term storage of vital documents, paper or film is still a good answer Beware of new technologies, even ones that are “standards” TIFF, JPEG, PDF, PDF/A are recommended. The weight of in-place document formats will mean that change will be very slow and may stop change unless a dramatic “out of the blue” technology appears Porter-Roth Associates 17
  18. 18. Conclusion & Questions Finally! Questions? Porter-Roth Associates 18