Andrew Waugh
Upcoming SlideShare
Loading in...5
×
 

Andrew Waugh

on

  • 444 views

 

Statistics

Views

Total Views
444
Views on SlideShare
444
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Any questions?

Andrew Waugh Andrew Waugh Presentation Transcript

  • Digital Preservation NOW Andrew Waugh Senior Technical Advisor Public Record Office Victoria
  • Goal of session
    • To present practical steps that you can take to preserve digital information now, without having a digital archive
  • Outline of session
    • Goal of preservation
    • Preserving the bit stream
    • Preserving accessibility
    • Preserving the context
    • Conclusions
  • The goal of preservation
    • Ensure access to records as long as they are required
    • A record is…
      • information created, received, and maintained as evidence and information by an organization or person, in persuance of legal obligations or in the transaction of business (AS ISO 15489.1-2002)
  • The key to records is evidence
    • What, where, when, how, who
    • Evidence to colleagues (business activity)
    • Evidence of accountability (investigations)
    • Evidence to courts (legal evidence)
    • Evidence to researchers (historical evidence)
  • So what does evidence require?
    • That record was produced as part of normal business process (authentic)
    • That record can be found & read (accessible)
    • That it can be related to the rest of the records (context)
    • That it hasn’t been tampered with (integrity)
  • Key issues
    • Preserving the bit stream
      • If you don’t have the bits, you don’t have anything
    • Preserving access to the information
      • In the face of fragile applications
    • Preserving the context
      • The evidence
  • Preserving the bit stream
  • Core issue
    • If you don’t have the binary data (files) that makes up the record there you cannot preserve anything
    • Problems you need to protect against
      • Media failures (corruption, crashes)
      • Technology obsolescence
      • Human error
  • Basically a solved problem
    • A core function of your IT department
      • Day to day operation of storage systems
      • Back-up/restore and disaster recovery
      • Periodic replacement of media and technology
  • Recommendations
    • Store on at least two pieces of media, ideally two technologies or (less ideally) two brands
    • Store in at least two sites
    • Information not being accessed must be periodically checked for corruption
    • Track individual pieces of media – include brand and batch
    • Always use mainstream technology in widespread use
  • Storage media (disc)
    • Default storage choice should be on-line (disc) storage unless massive storage required
      • e.g. 3 Terabytes RAID 5 ~$4000
    • RAID 1 or 6 (or derivatives) to guard against disc failures. RAID 5 is problematic now.
    • Expect to replace each disc within 5 years
    • External (USB) discs not recommended for long term storage (> 1 year)
  • Storage media (tape)
    • Choice when greater storage capacity than economic with disc
      • Be sure to factor in whole of life costs including media replacement and operator costs
    • Preferred formats LTO Ultrium, IBM 3592, T10000
    • Tape robots are preferred over manual handling
    • Get expert advice on tape solutions as these are no longer common – use only for large organisations
    • NEVER EVER choose leading edge technology, always stay within industry standard
  • Storage media (optical)
    • Prefer CD-R ( phthalocyanine dye)
    • Can use CD-R (azo dye) or DVD-R, but monitor carefully
    • Do not use CD-RW, DVD-RW, or CD-R (cyanine dye)
    • Use ‘name brands’, and archival quality if possible
    • Refresh in 2 to 5 years
    • Unlikely to be generally economic compared with disc or tape due to high operator cost and low capacity
  • Monitor…
    • Recommend statistical sampling of data to
      • check for corruption of copies (checksums)
      • deterioration of media
    • Technology watch to guard against obsolete media
      • plan for media refresh every 2 to 10 years
    • Track individual pieces of media (if used)
      • Ensure that none are lost
      • Ensure that all are tested and refreshed
  • Back-up & disaster recovery
    • Ensure that
      • Your IT organisation has both a back-up and disaster recovery regime
      • It is effective (periodically test restoration)
  • Preserving accessibility
  • Software fragility
    • Without software to interpret and display the content, the data is lost
      • Software may not run on the current version of the operating system or current computer
      • Current software version may not accurately deal with files from older versions
      • You may not have the required software
  • Do nothing option
    • So far has worked because backwards compatibility is better than we thought
      • Operating systems continue to support older programs (Windows, Unix/Linux)
      • Modern programs seem to have good support for files from older versions
      • This may not last forever…
  • If you are going to do nothing…
    • Perform a risk analysis
      • Survey your holdings to identify and quantify file formats
        • versions, if possible, ages if not
      • Consider risk of loss of access
        • Use criteria from normalisation section
      • Identify high value holdings
    • Monitor software trends (is risk increasing?)
    • Identify contingency plans
    • Influence users to use lower risk formats
  • Normalisation option
    • Proactively convert formats to a long term preservation format (LTPF)
    • This is a format that is likely to be usable for the forseeable future
      • Can find replacement software to render data
      • Can find software to migrate from LTPF to new format
    • Library of Congress sustainability factors
      • http://www.digitalpreservation.gov/formats/
  • Characteristics of a good LTPF
    • Supports critical features of your data
    • Published file format specification
    • Independent implementations
    • Wide community adoption
    • Simple
    • Formal standard
    • Public domain
    • Low risk conversion
  • If you normalise
    • Don’t jump out of the frying pan
      • Still need to do the analysis presented for ‘do nothing case’
      • Just fewer formats
    • Develop test regime to test conversion into nominated format
      • Suite of ‘typical’ documents illustrating critical features
  • LTPF suggestions
    • Documents
      • PDF/A, ODF
    • Images
      • TIFF, JPEG2000, JPEG (if already in JPEG)
    • Video
      • MPEG2 or MPEG4
  • Normalisation challenges
    • Many types of data have no suitable LTPF (e.g. CAD/GIS)
    • Long tail of formats (never be able to assign a LTPF for all types of digital object)
    • Loss of characteristics in the normalisation
    • Increasing complexity of digital objects (i.e. formats embedded within formats)
  • Digital rights management
    • DRM systems are designed to control (prevent) access to digital objects
      • Owner of digital object removes right of access
      • May not permit access even though it is required (e.g. investigations)
      • DRM system ceases to exist
    • DRM systems do not recognise an organisation’s right to use their records
    • Trusted Computing and Digitial Rights Management Principles and Policies, NZESC
      • http://www.e.govt.nz/policy/tc-and-drm/principles-policies-06/tc-drm-0906.pdf
  • Is it evidence? (Context)
  • Core Issues
    • If you cannot find it, it does not exist
    • If you can find it, and cannot understand the context, it is meaningless
      • Users are interested in the story, not a document
    • If you cannot show its authenticity, integrity, and context, it may have low evidential weight
  • It’s all basic records management
    • Create the record as part of the business process (authenticity)
      • This includes putting it aside
    • Putting the record in its context
      • Tell the story – who, what, where and when
    • Show that the record has not been subsequently modified
      • Audit log
  • Key requirements
    • Making sure that records are created in their context (business issue)
    • Having someplace to put the records and capture their context
      • Electronic Document & Records Management System (EDRMS)
      • Classification system
  • If you do not have an EDRMS?
    • Do whatever you can…
    • Set up classification system in
      • Email system
      • Corporate file server
    • Good idea even you plan to get an EDRMS
      • It gets everyone used to using a classification system
  • Why is metadata Important?
    • Who, what, where and when is answered by metadata associated with record
      • Captured (ideally) by system when record is created
      • Entered by user
    • Many different metadata standards
  • NAA/ANZ metadata standard
    • Proposed basis for an Australian recordkeeping standard
    • Australian Government Recordkeeping Standard version 2.0
      • http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16%2007%2008_Revised_tcm2-12630.pdf
  • Minimum metadata to be kept
    • Identifier (unique id referring to this object)
    • Name (human readable tag)
    • Start date (creation date)
    • Contextual link (relation with file, series)
    • Change history (demonstrating integrity)
    • Disposal (when and how to dispose of record)
    • Extent (size)
    • Agent (organisation or person associated with record)
  • What can you do now - storage
    • Make sure that your organisation can preserve the bits
      • Survey holdings of media to discover the extent of your problem
      • Move records off unmanaged, obsolete, deteriorating media
      • Ensure back-up and disaster recovery systems are in place and working
      • Sample records to detect corruption and decay
      • Plan to migrate to new technology
  • What can you do now – access
    • Make sure that your organisation can turn the files into something a human can understand
      • Survey holdings of records to understand what formats you have and the importance of the records
      • Perform a risk assessment on the formats
      • Choose an LTPF and normalise high risk formats
      • Encourage use of LTPF for business
  • What can you do now – context
    • Make sure that digital objects are records
      • Organise the objects so that they have a context (classification)
      • Move towards an EDRMS or business application that captures the records, preserves their context, and protects their integrity
  • Questions?