Your SlideShare is downloading. ×
0
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Andrew Waugh
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Andrew Waugh

283

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
283
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Any questions?
  • Transcript

    • 1. Digital Preservation NOW Andrew Waugh Senior Technical Advisor Public Record Office Victoria
    • 2. Goal of session <ul><li>To present practical steps that you can take to preserve digital information now, without having a digital archive </li></ul>
    • 3. Outline of session <ul><li>Goal of preservation </li></ul><ul><li>Preserving the bit stream </li></ul><ul><li>Preserving accessibility </li></ul><ul><li>Preserving the context </li></ul><ul><li>Conclusions </li></ul>
    • 4. The goal of preservation <ul><li>Ensure access to records as long as they are required </li></ul><ul><li>A record is… </li></ul><ul><ul><li>information created, received, and maintained as evidence and information by an organization or person, in persuance of legal obligations or in the transaction of business (AS ISO 15489.1-2002) </li></ul></ul>
    • 5. The key to records is evidence <ul><li>What, where, when, how, who </li></ul><ul><li>Evidence to colleagues (business activity) </li></ul><ul><li>Evidence of accountability (investigations) </li></ul><ul><li>Evidence to courts (legal evidence) </li></ul><ul><li>Evidence to researchers (historical evidence) </li></ul>
    • 6. So what does evidence require? <ul><li>That record was produced as part of normal business process (authentic) </li></ul><ul><li>That record can be found & read (accessible) </li></ul><ul><li>That it can be related to the rest of the records (context) </li></ul><ul><li>That it hasn’t been tampered with (integrity) </li></ul>
    • 7. Key issues <ul><li>Preserving the bit stream </li></ul><ul><ul><li>If you don’t have the bits, you don’t have anything </li></ul></ul><ul><li>Preserving access to the information </li></ul><ul><ul><li>In the face of fragile applications </li></ul></ul><ul><li>Preserving the context </li></ul><ul><ul><li>The evidence </li></ul></ul>
    • 8. Preserving the bit stream
    • 9. Core issue <ul><li>If you don’t have the binary data (files) that makes up the record there you cannot preserve anything </li></ul><ul><li>Problems you need to protect against </li></ul><ul><ul><li>Media failures (corruption, crashes) </li></ul></ul><ul><ul><li>Technology obsolescence </li></ul></ul><ul><ul><li>Human error </li></ul></ul>
    • 10. Basically a solved problem <ul><li>A core function of your IT department </li></ul><ul><ul><li>Day to day operation of storage systems </li></ul></ul><ul><ul><li>Back-up/restore and disaster recovery </li></ul></ul><ul><ul><li>Periodic replacement of media and technology </li></ul></ul>
    • 11. Recommendations <ul><li>Store on at least two pieces of media, ideally two technologies or (less ideally) two brands </li></ul><ul><li>Store in at least two sites </li></ul><ul><li>Information not being accessed must be periodically checked for corruption </li></ul><ul><li>Track individual pieces of media – include brand and batch </li></ul><ul><li>Always use mainstream technology in widespread use </li></ul>
    • 12. Storage media (disc) <ul><li>Default storage choice should be on-line (disc) storage unless massive storage required </li></ul><ul><ul><li>e.g. 3 Terabytes RAID 5 ~$4000 </li></ul></ul><ul><li>RAID 1 or 6 (or derivatives) to guard against disc failures. RAID 5 is problematic now. </li></ul><ul><li>Expect to replace each disc within 5 years </li></ul><ul><li>External (USB) discs not recommended for long term storage (> 1 year) </li></ul>
    • 13. Storage media (tape) <ul><li>Choice when greater storage capacity than economic with disc </li></ul><ul><ul><li>Be sure to factor in whole of life costs including media replacement and operator costs </li></ul></ul><ul><li>Preferred formats LTO Ultrium, IBM 3592, T10000 </li></ul><ul><li>Tape robots are preferred over manual handling </li></ul><ul><li>Get expert advice on tape solutions as these are no longer common – use only for large organisations </li></ul><ul><li>NEVER EVER choose leading edge technology, always stay within industry standard </li></ul>
    • 14. Storage media (optical) <ul><li>Prefer CD-R ( phthalocyanine dye) </li></ul><ul><li>Can use CD-R (azo dye) or DVD-R, but monitor carefully </li></ul><ul><li>Do not use CD-RW, DVD-RW, or CD-R (cyanine dye) </li></ul><ul><li>Use ‘name brands’, and archival quality if possible </li></ul><ul><li>Refresh in 2 to 5 years </li></ul><ul><li>Unlikely to be generally economic compared with disc or tape due to high operator cost and low capacity </li></ul>
    • 15. Monitor… <ul><li>Recommend statistical sampling of data to </li></ul><ul><ul><li>check for corruption of copies (checksums) </li></ul></ul><ul><ul><li>deterioration of media </li></ul></ul><ul><li>Technology watch to guard against obsolete media </li></ul><ul><ul><li>plan for media refresh every 2 to 10 years </li></ul></ul><ul><li>Track individual pieces of media (if used) </li></ul><ul><ul><li>Ensure that none are lost </li></ul></ul><ul><ul><li>Ensure that all are tested and refreshed </li></ul></ul>
    • 16. Back-up & disaster recovery <ul><li>Ensure that </li></ul><ul><ul><li>Your IT organisation has both a back-up and disaster recovery regime </li></ul></ul><ul><ul><li>It is effective (periodically test restoration) </li></ul></ul>
    • 17. Preserving accessibility
    • 18. Software fragility <ul><li>Without software to interpret and display the content, the data is lost </li></ul><ul><ul><li>Software may not run on the current version of the operating system or current computer </li></ul></ul><ul><ul><li>Current software version may not accurately deal with files from older versions </li></ul></ul><ul><ul><li>You may not have the required software </li></ul></ul>
    • 19. Do nothing option <ul><li>So far has worked because backwards compatibility is better than we thought </li></ul><ul><ul><li>Operating systems continue to support older programs (Windows, Unix/Linux) </li></ul></ul><ul><ul><li>Modern programs seem to have good support for files from older versions </li></ul></ul><ul><ul><li>This may not last forever… </li></ul></ul>
    • 20. If you are going to do nothing… <ul><li>Perform a risk analysis </li></ul><ul><ul><li>Survey your holdings to identify and quantify file formats </li></ul></ul><ul><ul><ul><li>versions, if possible, ages if not </li></ul></ul></ul><ul><ul><li>Consider risk of loss of access </li></ul></ul><ul><ul><ul><li>Use criteria from normalisation section </li></ul></ul></ul><ul><ul><li>Identify high value holdings </li></ul></ul><ul><li>Monitor software trends (is risk increasing?) </li></ul><ul><li>Identify contingency plans </li></ul><ul><li>Influence users to use lower risk formats </li></ul>
    • 21. Normalisation option <ul><li>Proactively convert formats to a long term preservation format (LTPF) </li></ul><ul><li>This is a format that is likely to be usable for the forseeable future </li></ul><ul><ul><li>Can find replacement software to render data </li></ul></ul><ul><ul><li>Can find software to migrate from LTPF to new format </li></ul></ul><ul><li>Library of Congress sustainability factors </li></ul><ul><ul><li>http://www.digitalpreservation.gov/formats/ </li></ul></ul>
    • 22. Characteristics of a good LTPF <ul><li>Supports critical features of your data </li></ul><ul><li>Published file format specification </li></ul><ul><li>Independent implementations </li></ul><ul><li>Wide community adoption </li></ul><ul><li>Simple </li></ul><ul><li>Formal standard </li></ul><ul><li>Public domain </li></ul><ul><li>Low risk conversion </li></ul>
    • 23. If you normalise <ul><li>Don’t jump out of the frying pan </li></ul><ul><ul><li>Still need to do the analysis presented for ‘do nothing case’ </li></ul></ul><ul><ul><li>Just fewer formats </li></ul></ul><ul><li>Develop test regime to test conversion into nominated format </li></ul><ul><ul><li>Suite of ‘typical’ documents illustrating critical features </li></ul></ul>
    • 24. LTPF suggestions <ul><li>Documents </li></ul><ul><ul><li>PDF/A, ODF </li></ul></ul><ul><li>Images </li></ul><ul><ul><li>TIFF, JPEG2000, JPEG (if already in JPEG) </li></ul></ul><ul><li>Video </li></ul><ul><ul><li>MPEG2 or MPEG4 </li></ul></ul>
    • 25. Normalisation challenges <ul><li>Many types of data have no suitable LTPF (e.g. CAD/GIS) </li></ul><ul><li>Long tail of formats (never be able to assign a LTPF for all types of digital object) </li></ul><ul><li>Loss of characteristics in the normalisation </li></ul><ul><li>Increasing complexity of digital objects (i.e. formats embedded within formats) </li></ul>
    • 26. Digital rights management <ul><li>DRM systems are designed to control (prevent) access to digital objects </li></ul><ul><ul><li>Owner of digital object removes right of access </li></ul></ul><ul><ul><li>May not permit access even though it is required (e.g. investigations) </li></ul></ul><ul><ul><li>DRM system ceases to exist </li></ul></ul><ul><li>DRM systems do not recognise an organisation’s right to use their records </li></ul><ul><li>Trusted Computing and Digitial Rights Management Principles and Policies, NZESC </li></ul><ul><ul><li>http://www.e.govt.nz/policy/tc-and-drm/principles-policies-06/tc-drm-0906.pdf </li></ul></ul>
    • 27. Is it evidence? (Context)
    • 28. Core Issues <ul><li>If you cannot find it, it does not exist </li></ul><ul><li>If you can find it, and cannot understand the context, it is meaningless </li></ul><ul><ul><li>Users are interested in the story, not a document </li></ul></ul><ul><li>If you cannot show its authenticity, integrity, and context, it may have low evidential weight </li></ul>
    • 29. It’s all basic records management <ul><li>Create the record as part of the business process (authenticity) </li></ul><ul><ul><li>This includes putting it aside </li></ul></ul><ul><li>Putting the record in its context </li></ul><ul><ul><li>Tell the story – who, what, where and when </li></ul></ul><ul><li>Show that the record has not been subsequently modified </li></ul><ul><ul><li>Audit log </li></ul></ul>
    • 30. Key requirements <ul><li>Making sure that records are created in their context (business issue) </li></ul><ul><li>Having someplace to put the records and capture their context </li></ul><ul><ul><li>Electronic Document & Records Management System (EDRMS) </li></ul></ul><ul><ul><li>Classification system </li></ul></ul>
    • 31. If you do not have an EDRMS? <ul><li>Do whatever you can… </li></ul><ul><li>Set up classification system in </li></ul><ul><ul><li>Email system </li></ul></ul><ul><ul><li>Corporate file server </li></ul></ul><ul><li>Good idea even you plan to get an EDRMS </li></ul><ul><ul><li>It gets everyone used to using a classification system </li></ul></ul>
    • 32. Why is metadata Important? <ul><li>Who, what, where and when is answered by metadata associated with record </li></ul><ul><ul><li>Captured (ideally) by system when record is created </li></ul></ul><ul><ul><li>Entered by user </li></ul></ul><ul><li>Many different metadata standards </li></ul>
    • 33. NAA/ANZ metadata standard <ul><li>Proposed basis for an Australian recordkeeping standard </li></ul><ul><li>Australian Government Recordkeeping Standard version 2.0 </li></ul><ul><ul><li>http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16%2007%2008_Revised_tcm2-12630.pdf </li></ul></ul>
    • 34. Minimum metadata to be kept <ul><li>Identifier (unique id referring to this object) </li></ul><ul><li>Name (human readable tag) </li></ul><ul><li>Start date (creation date) </li></ul><ul><li>Contextual link (relation with file, series) </li></ul><ul><li>Change history (demonstrating integrity) </li></ul><ul><li>Disposal (when and how to dispose of record) </li></ul><ul><li>Extent (size) </li></ul><ul><li>Agent (organisation or person associated with record) </li></ul>
    • 35. What can you do now - storage <ul><li>Make sure that your organisation can preserve the bits </li></ul><ul><ul><li>Survey holdings of media to discover the extent of your problem </li></ul></ul><ul><ul><li>Move records off unmanaged, obsolete, deteriorating media </li></ul></ul><ul><ul><li>Ensure back-up and disaster recovery systems are in place and working </li></ul></ul><ul><ul><li>Sample records to detect corruption and decay </li></ul></ul><ul><ul><li>Plan to migrate to new technology </li></ul></ul>
    • 36. What can you do now – access <ul><li>Make sure that your organisation can turn the files into something a human can understand </li></ul><ul><ul><li>Survey holdings of records to understand what formats you have and the importance of the records </li></ul></ul><ul><ul><li>Perform a risk assessment on the formats </li></ul></ul><ul><ul><li>Choose an LTPF and normalise high risk formats </li></ul></ul><ul><ul><li>Encourage use of LTPF for business </li></ul></ul>
    • 37. What can you do now – context <ul><li>Make sure that digital objects are records </li></ul><ul><ul><li>Organise the objects so that they have a context (classification) </li></ul></ul><ul><ul><li>Move towards an EDRMS or business application that captures the records, preserves their context, and protects their integrity </li></ul></ul>
    • 38. Questions?

    ×