Digital Preservation NOW         Andrew Waugh   Senior Technical Advisor  Public Record Office Victoria
Goal of session• To present practical steps that you can take to  preserve digital information now, without  having a digi...
Outline of session•   Goal of preservation•   Preserving the bit stream•   Preserving accessibility•   Preserving the cont...
The goal of preservation• Ensure access to records as long as they are  required• A record is…  – information created, rec...
The key to records is evidence•   What, where, when, how, who•   Evidence to colleagues (business activity)•   Evidence of...
So what does evidence require?• That record was produced as part of normal  business process (authentic)• That record can ...
Key issues• Preserving the bit stream  – If you don’t have the bits, you don’t have anything• Preserving access to the inf...
Preserving the bit stream
Core issue• If you don’t have the binary data (files) that  makes up the record there you cannot  preserve anything• Probl...
Basically a solved problem• A core function of your IT department  – Day to day operation of storage systems  – Back-up/re...
Recommendations• Store on at least two pieces of media, ideally two  technologies or (less ideally) two brands• Store in a...
Storage media (disc)• Default storage choice should be on-line (disc)  storage unless massive storage required  – e.g. 3 T...
Storage media (tape)• Choice when greater storage capacity than  economic with disc  – Be sure to factor in whole of life ...
Storage media (optical)• Prefer CD-R (phthalocyanine dye)• Can use CD-R (azo dye) or DVD-R, but monitor  carefully• Do not...
Monitor…• Recommend statistical sampling of data to  – check for corruption of copies (checksums)  – deterioration of medi...
Back-up & disaster recovery• Ensure that  – Your IT organisation has both a back-up and    disaster recovery regime  – It ...
Preserving accessibility
Software fragility• Without software to interpret and display the  content, the data is lost  – Software may not run on th...
Do nothing option• So far has worked because backwards  compatibility is better than we thought  – Operating systems conti...
If you are going to do nothing…• Perform a risk analysis  – Survey your holdings to identify and quantify file formats    ...
Normalisation option• Proactively convert formats to a long term  preservation format (LTPF)• This is a format that is lik...
Characteristics of a good LTPF•   Supports critical features of your data•   Published file format specification•   Indepe...
If you normalise• Don’t jump out of the frying pan  – Still need to do the analysis presented for ‘do    nothing case’  – ...
LTPF suggestions• Documents  – PDF/A, ODF• Images  – TIFF, JPEG2000, JPEG (if already in JPEG)• Video  – MPEG2 or MPEG4
Normalisation challenges• Many types of data have no suitable LTPF  (e.g. CAD/GIS)• Long tail of formats (never be able to...
Digital rights management• DRM systems are designed to control (prevent) access to  digital objects   – Owner of digital o...
Is it evidence?             (Context)
Core Issues• If you cannot find it, it does not exist• If you can find it, and cannot understand the  context, it is meani...
It’s all basic records management• Create the record as part of the business  process (authenticity)  – This includes putt...
Key requirements• Making sure that records are created in their  context (business issue)• Having someplace to put the rec...
If you do not have an EDRMS?• Do whatever you can…• Set up classification system in  – Email system  – Corporate file serv...
Why is metadata Important?• Who, what, where and when is answered by  metadata associated with record  – Captured (ideally...
NAA/ANZ metadata standard• Proposed basis for an Australian  recordkeeping standard• Australian Government Recordkeeping  ...
Minimum metadata to be kept•   Identifier (unique id referring to this object)•   Name (human readable tag)•   Start date ...
What can you do now - storage• Make sure that your organisation can preserve the  bits  – Survey holdings of media to disc...
What can you do now – access• Make sure that your organisation can turn the  files into something a human can understand  ...
What can you do now – context• Make sure that digital objects are records  – Organise the objects so that they have a cont...
Questions?
Upcoming SlideShare
Loading in …5
×

Andrew waugh

356 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
356
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Andrew waugh

  1. 1. Digital Preservation NOW Andrew Waugh Senior Technical Advisor Public Record Office Victoria
  2. 2. Goal of session• To present practical steps that you can take to preserve digital information now, without having a digital archive
  3. 3. Outline of session• Goal of preservation• Preserving the bit stream• Preserving accessibility• Preserving the context• Conclusions
  4. 4. The goal of preservation• Ensure access to records as long as they are required• A record is… – information created, received, and maintained as evidence and information by an organization or person, in persuance of legal obligations or in the transaction of business (AS ISO 15489.1-2002)
  5. 5. The key to records is evidence• What, where, when, how, who• Evidence to colleagues (business activity)• Evidence of accountability (investigations)• Evidence to courts (legal evidence)• Evidence to researchers (historical evidence)
  6. 6. So what does evidence require?• That record was produced as part of normal business process (authentic)• That record can be found & read (accessible)• That it can be related to the rest of the records (context)• That it hasn’t been tampered with (integrity)
  7. 7. Key issues• Preserving the bit stream – If you don’t have the bits, you don’t have anything• Preserving access to the information – In the face of fragile applications• Preserving the context – The evidence
  8. 8. Preserving the bit stream
  9. 9. Core issue• If you don’t have the binary data (files) that makes up the record there you cannot preserve anything• Problems you need to protect against – Media failures (corruption, crashes) – Technology obsolescence – Human error
  10. 10. Basically a solved problem• A core function of your IT department – Day to day operation of storage systems – Back-up/restore and disaster recovery – Periodic replacement of media and technology
  11. 11. Recommendations• Store on at least two pieces of media, ideally two technologies or (less ideally) two brands• Store in at least two sites• Information not being accessed must be periodically checked for corruption• Track individual pieces of media – include brand and batch• Always use mainstream technology in widespread use
  12. 12. Storage media (disc)• Default storage choice should be on-line (disc) storage unless massive storage required – e.g. 3 Terabytes RAID 5 ~$4000• RAID 1 or 6 (or derivatives) to guard against disc failures. RAID 5 is problematic now.• Expect to replace each disc within 5 years• External (USB) discs not recommended for long term storage (> 1 year)
  13. 13. Storage media (tape)• Choice when greater storage capacity than economic with disc – Be sure to factor in whole of life costs including media replacement and operator costs• Preferred formats LTO Ultrium, IBM 3592, T10000• Tape robots are preferred over manual handling• Get expert advice on tape solutions as these are no longer common – use only for large organisations• NEVER EVER choose leading edge technology, always stay within industry standard
  14. 14. Storage media (optical)• Prefer CD-R (phthalocyanine dye)• Can use CD-R (azo dye) or DVD-R, but monitor carefully• Do not use CD-RW, DVD-RW, or CD-R (cyanine dye)• Use ‘name brands’, and archival quality if possible• Refresh in 2 to 5 years• Unlikely to be generally economic compared with disc or tape due to high operator cost and low capacity
  15. 15. Monitor…• Recommend statistical sampling of data to – check for corruption of copies (checksums) – deterioration of media• Technology watch to guard against obsolete media – plan for media refresh every 2 to 10 years• Track individual pieces of media (if used) – Ensure that none are lost – Ensure that all are tested and refreshed
  16. 16. Back-up & disaster recovery• Ensure that – Your IT organisation has both a back-up and disaster recovery regime – It is effective (periodically test restoration)
  17. 17. Preserving accessibility
  18. 18. Software fragility• Without software to interpret and display the content, the data is lost – Software may not run on the current version of the operating system or current computer – Current software version may not accurately deal with files from older versions – You may not have the required software
  19. 19. Do nothing option• So far has worked because backwards compatibility is better than we thought – Operating systems continue to support older programs (Windows, Unix/Linux) – Modern programs seem to have good support for files from older versions – This may not last forever…
  20. 20. If you are going to do nothing…• Perform a risk analysis – Survey your holdings to identify and quantify file formats • versions, if possible, ages if not – Consider risk of loss of access • Use criteria from normalisation section – Identify high value holdings• Monitor software trends (is risk increasing?)• Identify contingency plans• Influence users to use lower risk formats
  21. 21. Normalisation option• Proactively convert formats to a long term preservation format (LTPF)• This is a format that is likely to be usable for the forseeable future – Can find replacement software to render data – Can find software to migrate from LTPF to new format• Library of Congress sustainability factors – http://www.digitalpreservation.gov/formats/
  22. 22. Characteristics of a good LTPF• Supports critical features of your data• Published file format specification• Independent implementations• Wide community adoption• Simple• Formal standard• Public domain• Low risk conversion
  23. 23. If you normalise• Don’t jump out of the frying pan – Still need to do the analysis presented for ‘do nothing case’ – Just fewer formats• Develop test regime to test conversion into nominated format – Suite of ‘typical’ documents illustrating critical features
  24. 24. LTPF suggestions• Documents – PDF/A, ODF• Images – TIFF, JPEG2000, JPEG (if already in JPEG)• Video – MPEG2 or MPEG4
  25. 25. Normalisation challenges• Many types of data have no suitable LTPF (e.g. CAD/GIS)• Long tail of formats (never be able to assign a LTPF for all types of digital object)• Loss of characteristics in the normalisation• Increasing complexity of digital objects (i.e. formats embedded within formats)
  26. 26. Digital rights management• DRM systems are designed to control (prevent) access to digital objects – Owner of digital object removes right of access – May not permit access even though it is required (e.g. investigations) – DRM system ceases to exist• DRM systems do not recognise an organisation’s right to use their records• Trusted Computing and Digitial Rights Management Principles and Policies, NZESC – http://www.e.govt.nz/policy/tc-and-drm/principles-policies-06/tc-drm- 0906.pdf
  27. 27. Is it evidence? (Context)
  28. 28. Core Issues• If you cannot find it, it does not exist• If you can find it, and cannot understand the context, it is meaningless – Users are interested in the story, not a document• If you cannot show its authenticity, integrity, and context, it may have low evidential weight
  29. 29. It’s all basic records management• Create the record as part of the business process (authenticity) – This includes putting it aside• Putting the record in its context – Tell the story – who, what, where and when• Show that the record has not been subsequently modified – Audit log
  30. 30. Key requirements• Making sure that records are created in their context (business issue)• Having someplace to put the records and capture their context – Electronic Document & Records Management System (EDRMS) – Classification system
  31. 31. If you do not have an EDRMS?• Do whatever you can…• Set up classification system in – Email system – Corporate file server• Good idea even you plan to get an EDRMS – It gets everyone used to using a classification system
  32. 32. Why is metadata Important?• Who, what, where and when is answered by metadata associated with record – Captured (ideally) by system when record is created – Entered by user• Many different metadata standards
  33. 33. NAA/ANZ metadata standard• Proposed basis for an Australian recordkeeping standard• Australian Government Recordkeeping Standard version 2.0 – http://www.naa.gov.au/Images/AGRkMS_Final%2 0Edit_16%2007%2008_Revised_tcm2-12630.pdf
  34. 34. Minimum metadata to be kept• Identifier (unique id referring to this object)• Name (human readable tag)• Start date (creation date)• Contextual link (relation with file, series)• Change history (demonstrating integrity)• Disposal (when and how to dispose of record)• Extent (size)• Agent (organisation or person associated with record)
  35. 35. What can you do now - storage• Make sure that your organisation can preserve the bits – Survey holdings of media to discover the extent of your problem – Move records off unmanaged, obsolete, deteriorating media – Ensure back-up and disaster recovery systems are in place and working – Sample records to detect corruption and decay – Plan to migrate to new technology
  36. 36. What can you do now – access• Make sure that your organisation can turn the files into something a human can understand – Survey holdings of records to understand what formats you have and the importance of the records – Perform a risk assessment on the formats – Choose an LTPF and normalise high risk formats – Encourage use of LTPF for business
  37. 37. What can you do now – context• Make sure that digital objects are records – Organise the objects so that they have a context (classification) – Move towards an EDRMS or business application that captures the records, preserves their context, and protects their integrity
  38. 38. Questions?

×