Andrew Waugh


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Any questions?
  • Andrew Waugh

    1. 1. Digital Preservation NOW Andrew Waugh Senior Technical Advisor Public Record Office Victoria
    2. 2. Goal of session <ul><li>To present practical steps that you can take to preserve digital information now, without having a digital archive </li></ul>
    3. 3. Outline of session <ul><li>Goal of preservation </li></ul><ul><li>Preserving the bit stream </li></ul><ul><li>Preserving accessibility </li></ul><ul><li>Preserving the context </li></ul><ul><li>Conclusions </li></ul>
    4. 4. The goal of preservation <ul><li>Ensure access to records as long as they are required </li></ul><ul><li>A record is… </li></ul><ul><ul><li>information created, received, and maintained as evidence and information by an organization or person, in persuance of legal obligations or in the transaction of business (AS ISO 15489.1-2002) </li></ul></ul>
    5. 5. The key to records is evidence <ul><li>What, where, when, how, who </li></ul><ul><li>Evidence to colleagues (business activity) </li></ul><ul><li>Evidence of accountability (investigations) </li></ul><ul><li>Evidence to courts (legal evidence) </li></ul><ul><li>Evidence to researchers (historical evidence) </li></ul>
    6. 6. So what does evidence require? <ul><li>That record was produced as part of normal business process (authentic) </li></ul><ul><li>That record can be found & read (accessible) </li></ul><ul><li>That it can be related to the rest of the records (context) </li></ul><ul><li>That it hasn’t been tampered with (integrity) </li></ul>
    7. 7. Key issues <ul><li>Preserving the bit stream </li></ul><ul><ul><li>If you don’t have the bits, you don’t have anything </li></ul></ul><ul><li>Preserving access to the information </li></ul><ul><ul><li>In the face of fragile applications </li></ul></ul><ul><li>Preserving the context </li></ul><ul><ul><li>The evidence </li></ul></ul>
    8. 8. Preserving the bit stream
    9. 9. Core issue <ul><li>If you don’t have the binary data (files) that makes up the record there you cannot preserve anything </li></ul><ul><li>Problems you need to protect against </li></ul><ul><ul><li>Media failures (corruption, crashes) </li></ul></ul><ul><ul><li>Technology obsolescence </li></ul></ul><ul><ul><li>Human error </li></ul></ul>
    10. 10. Basically a solved problem <ul><li>A core function of your IT department </li></ul><ul><ul><li>Day to day operation of storage systems </li></ul></ul><ul><ul><li>Back-up/restore and disaster recovery </li></ul></ul><ul><ul><li>Periodic replacement of media and technology </li></ul></ul>
    11. 11. Recommendations <ul><li>Store on at least two pieces of media, ideally two technologies or (less ideally) two brands </li></ul><ul><li>Store in at least two sites </li></ul><ul><li>Information not being accessed must be periodically checked for corruption </li></ul><ul><li>Track individual pieces of media – include brand and batch </li></ul><ul><li>Always use mainstream technology in widespread use </li></ul>
    12. 12. Storage media (disc) <ul><li>Default storage choice should be on-line (disc) storage unless massive storage required </li></ul><ul><ul><li>e.g. 3 Terabytes RAID 5 ~$4000 </li></ul></ul><ul><li>RAID 1 or 6 (or derivatives) to guard against disc failures. RAID 5 is problematic now. </li></ul><ul><li>Expect to replace each disc within 5 years </li></ul><ul><li>External (USB) discs not recommended for long term storage (> 1 year) </li></ul>
    13. 13. Storage media (tape) <ul><li>Choice when greater storage capacity than economic with disc </li></ul><ul><ul><li>Be sure to factor in whole of life costs including media replacement and operator costs </li></ul></ul><ul><li>Preferred formats LTO Ultrium, IBM 3592, T10000 </li></ul><ul><li>Tape robots are preferred over manual handling </li></ul><ul><li>Get expert advice on tape solutions as these are no longer common – use only for large organisations </li></ul><ul><li>NEVER EVER choose leading edge technology, always stay within industry standard </li></ul>
    14. 14. Storage media (optical) <ul><li>Prefer CD-R ( phthalocyanine dye) </li></ul><ul><li>Can use CD-R (azo dye) or DVD-R, but monitor carefully </li></ul><ul><li>Do not use CD-RW, DVD-RW, or CD-R (cyanine dye) </li></ul><ul><li>Use ‘name brands’, and archival quality if possible </li></ul><ul><li>Refresh in 2 to 5 years </li></ul><ul><li>Unlikely to be generally economic compared with disc or tape due to high operator cost and low capacity </li></ul>
    15. 15. Monitor… <ul><li>Recommend statistical sampling of data to </li></ul><ul><ul><li>check for corruption of copies (checksums) </li></ul></ul><ul><ul><li>deterioration of media </li></ul></ul><ul><li>Technology watch to guard against obsolete media </li></ul><ul><ul><li>plan for media refresh every 2 to 10 years </li></ul></ul><ul><li>Track individual pieces of media (if used) </li></ul><ul><ul><li>Ensure that none are lost </li></ul></ul><ul><ul><li>Ensure that all are tested and refreshed </li></ul></ul>
    16. 16. Back-up & disaster recovery <ul><li>Ensure that </li></ul><ul><ul><li>Your IT organisation has both a back-up and disaster recovery regime </li></ul></ul><ul><ul><li>It is effective (periodically test restoration) </li></ul></ul>
    17. 17. Preserving accessibility
    18. 18. Software fragility <ul><li>Without software to interpret and display the content, the data is lost </li></ul><ul><ul><li>Software may not run on the current version of the operating system or current computer </li></ul></ul><ul><ul><li>Current software version may not accurately deal with files from older versions </li></ul></ul><ul><ul><li>You may not have the required software </li></ul></ul>
    19. 19. Do nothing option <ul><li>So far has worked because backwards compatibility is better than we thought </li></ul><ul><ul><li>Operating systems continue to support older programs (Windows, Unix/Linux) </li></ul></ul><ul><ul><li>Modern programs seem to have good support for files from older versions </li></ul></ul><ul><ul><li>This may not last forever… </li></ul></ul>
    20. 20. If you are going to do nothing… <ul><li>Perform a risk analysis </li></ul><ul><ul><li>Survey your holdings to identify and quantify file formats </li></ul></ul><ul><ul><ul><li>versions, if possible, ages if not </li></ul></ul></ul><ul><ul><li>Consider risk of loss of access </li></ul></ul><ul><ul><ul><li>Use criteria from normalisation section </li></ul></ul></ul><ul><ul><li>Identify high value holdings </li></ul></ul><ul><li>Monitor software trends (is risk increasing?) </li></ul><ul><li>Identify contingency plans </li></ul><ul><li>Influence users to use lower risk formats </li></ul>
    21. 21. Normalisation option <ul><li>Proactively convert formats to a long term preservation format (LTPF) </li></ul><ul><li>This is a format that is likely to be usable for the forseeable future </li></ul><ul><ul><li>Can find replacement software to render data </li></ul></ul><ul><ul><li>Can find software to migrate from LTPF to new format </li></ul></ul><ul><li>Library of Congress sustainability factors </li></ul><ul><ul><li> </li></ul></ul>
    22. 22. Characteristics of a good LTPF <ul><li>Supports critical features of your data </li></ul><ul><li>Published file format specification </li></ul><ul><li>Independent implementations </li></ul><ul><li>Wide community adoption </li></ul><ul><li>Simple </li></ul><ul><li>Formal standard </li></ul><ul><li>Public domain </li></ul><ul><li>Low risk conversion </li></ul>
    23. 23. If you normalise <ul><li>Don’t jump out of the frying pan </li></ul><ul><ul><li>Still need to do the analysis presented for ‘do nothing case’ </li></ul></ul><ul><ul><li>Just fewer formats </li></ul></ul><ul><li>Develop test regime to test conversion into nominated format </li></ul><ul><ul><li>Suite of ‘typical’ documents illustrating critical features </li></ul></ul>
    24. 24. LTPF suggestions <ul><li>Documents </li></ul><ul><ul><li>PDF/A, ODF </li></ul></ul><ul><li>Images </li></ul><ul><ul><li>TIFF, JPEG2000, JPEG (if already in JPEG) </li></ul></ul><ul><li>Video </li></ul><ul><ul><li>MPEG2 or MPEG4 </li></ul></ul>
    25. 25. Normalisation challenges <ul><li>Many types of data have no suitable LTPF (e.g. CAD/GIS) </li></ul><ul><li>Long tail of formats (never be able to assign a LTPF for all types of digital object) </li></ul><ul><li>Loss of characteristics in the normalisation </li></ul><ul><li>Increasing complexity of digital objects (i.e. formats embedded within formats) </li></ul>
    26. 26. Digital rights management <ul><li>DRM systems are designed to control (prevent) access to digital objects </li></ul><ul><ul><li>Owner of digital object removes right of access </li></ul></ul><ul><ul><li>May not permit access even though it is required (e.g. investigations) </li></ul></ul><ul><ul><li>DRM system ceases to exist </li></ul></ul><ul><li>DRM systems do not recognise an organisation’s right to use their records </li></ul><ul><li>Trusted Computing and Digitial Rights Management Principles and Policies, NZESC </li></ul><ul><ul><li> </li></ul></ul>
    27. 27. Is it evidence? (Context)
    28. 28. Core Issues <ul><li>If you cannot find it, it does not exist </li></ul><ul><li>If you can find it, and cannot understand the context, it is meaningless </li></ul><ul><ul><li>Users are interested in the story, not a document </li></ul></ul><ul><li>If you cannot show its authenticity, integrity, and context, it may have low evidential weight </li></ul>
    29. 29. It’s all basic records management <ul><li>Create the record as part of the business process (authenticity) </li></ul><ul><ul><li>This includes putting it aside </li></ul></ul><ul><li>Putting the record in its context </li></ul><ul><ul><li>Tell the story – who, what, where and when </li></ul></ul><ul><li>Show that the record has not been subsequently modified </li></ul><ul><ul><li>Audit log </li></ul></ul>
    30. 30. Key requirements <ul><li>Making sure that records are created in their context (business issue) </li></ul><ul><li>Having someplace to put the records and capture their context </li></ul><ul><ul><li>Electronic Document & Records Management System (EDRMS) </li></ul></ul><ul><ul><li>Classification system </li></ul></ul>
    31. 31. If you do not have an EDRMS? <ul><li>Do whatever you can… </li></ul><ul><li>Set up classification system in </li></ul><ul><ul><li>Email system </li></ul></ul><ul><ul><li>Corporate file server </li></ul></ul><ul><li>Good idea even you plan to get an EDRMS </li></ul><ul><ul><li>It gets everyone used to using a classification system </li></ul></ul>
    32. 32. Why is metadata Important? <ul><li>Who, what, where and when is answered by metadata associated with record </li></ul><ul><ul><li>Captured (ideally) by system when record is created </li></ul></ul><ul><ul><li>Entered by user </li></ul></ul><ul><li>Many different metadata standards </li></ul>
    33. 33. NAA/ANZ metadata standard <ul><li>Proposed basis for an Australian recordkeeping standard </li></ul><ul><li>Australian Government Recordkeeping Standard version 2.0 </li></ul><ul><ul><li> </li></ul></ul>
    34. 34. Minimum metadata to be kept <ul><li>Identifier (unique id referring to this object) </li></ul><ul><li>Name (human readable tag) </li></ul><ul><li>Start date (creation date) </li></ul><ul><li>Contextual link (relation with file, series) </li></ul><ul><li>Change history (demonstrating integrity) </li></ul><ul><li>Disposal (when and how to dispose of record) </li></ul><ul><li>Extent (size) </li></ul><ul><li>Agent (organisation or person associated with record) </li></ul>
    35. 35. What can you do now - storage <ul><li>Make sure that your organisation can preserve the bits </li></ul><ul><ul><li>Survey holdings of media to discover the extent of your problem </li></ul></ul><ul><ul><li>Move records off unmanaged, obsolete, deteriorating media </li></ul></ul><ul><ul><li>Ensure back-up and disaster recovery systems are in place and working </li></ul></ul><ul><ul><li>Sample records to detect corruption and decay </li></ul></ul><ul><ul><li>Plan to migrate to new technology </li></ul></ul>
    36. 36. What can you do now – access <ul><li>Make sure that your organisation can turn the files into something a human can understand </li></ul><ul><ul><li>Survey holdings of records to understand what formats you have and the importance of the records </li></ul></ul><ul><ul><li>Perform a risk assessment on the formats </li></ul></ul><ul><ul><li>Choose an LTPF and normalise high risk formats </li></ul></ul><ul><ul><li>Encourage use of LTPF for business </li></ul></ul>
    37. 37. What can you do now – context <ul><li>Make sure that digital objects are records </li></ul><ul><ul><li>Organise the objects so that they have a context (classification) </li></ul></ul><ul><ul><li>Move towards an EDRMS or business application that captures the records, preserves their context, and protects their integrity </li></ul></ul>
    38. 38. Questions?