Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Brief Introduction to Digital Preservation


Published on

Presentation slides from a lecture given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010

Published in: Technology, Education
  • Be the first to comment

Brief Introduction to Digital Preservation

  1. 1. UKOLN is supported by: A brief introduction to digital preservation Michael Day Research and Development Team Leader UKOLN, University of Bath MSc Lecture, UWE, Bristol, 10 March 2010
  2. 2. Presentation outline <ul><li>Digital preservation basics </li></ul><ul><ul><li>Digital preservation challenges </li></ul></ul><ul><ul><li>The OAIS Reference Model </li></ul></ul><ul><ul><li>Digital preservation principles and strategies </li></ul></ul><ul><ul><li>Digital preservation tools: </li></ul></ul><ul><li>Case studies (if time): </li></ul><ul><ul><li>E-mail </li></ul></ul><ul><ul><li>Websites </li></ul></ul><ul><li>Exercise </li></ul>
  3. 3. Digital preservation challenges (1) <ul><li>Technical challenges </li></ul><ul><ul><li>Digital media </li></ul></ul><ul><ul><ul><li>Currently magnetic or optical tape and disks, some devices (e.g., memory sticks) </li></ul></ul></ul><ul><ul><ul><li>Uncertain lifetimes </li></ul></ul></ul><ul><ul><li>Hardware and software dependence </li></ul></ul><ul><ul><ul><li>Most digital objects are dependent on particular configurations of hardware and software </li></ul></ul></ul><ul><ul><ul><li>Relatively short obsolescence cycles </li></ul></ul></ul>
  4. 4. Digital preservation challenges (2) <ul><li>Conceptual challenges: </li></ul><ul><ul><li>Three levels of information required: </li></ul></ul><ul><ul><ul><li>Physical layer – unusually a bitstream </li></ul></ul></ul><ul><ul><ul><li>Logical layer – defines how to interpret the bitstream (through software) to generate meaningful information (e.g. ASCII, XML, file formats) </li></ul></ul></ul><ul><ul><ul><li>Conceptual layer – real world objects </li></ul></ul></ul><ul><ul><ul><ul><li>Some are analogues of traditional objects, e.g. meeting minutes, research papers </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Others are not, e.g. Web pages, GIS, 3D models of chemical structures </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Complex and dynamic </li></ul></ul></ul></ul></ul>
  5. 5. Digital preservation challenges (3) <ul><ul><li>On which of the three layers should preservation activities focus? </li></ul></ul><ul><ul><ul><li>We need to preserve the ability to reproduce the objects, not just the bits </li></ul></ul></ul><ul><ul><ul><li>In fact, we can change the bits and logical representation and still reproduce an ‘authentic’ conceptual object (e.g. by converting a text file into PDF or TIFF) </li></ul></ul></ul><ul><li>Authenticity and integrity </li></ul><ul><ul><li>How can we trust that an object is what it claims to be? </li></ul></ul><ul><ul><li>Digital information can easily be changed by accident or design </li></ul></ul>
  6. 6. Digital preservation basics <ul><li>An ongoing approach to managing digital content based on: </li></ul><ul><ul><li>The identification and adoption of appropriate preservation strategies </li></ul></ul><ul><ul><ul><li>Creation or Ingest stages are normally the best time to ensure that data are fit-for-purpose and “preservable” </li></ul></ul></ul><ul><ul><li>The collection and management of appropriate metadata </li></ul></ul><ul><ul><ul><li>Capture of explicit and implicit knowledge, contexts </li></ul></ul></ul><ul><ul><li>The ongoing monitoring of technical contexts and the application of preservation planning techniques </li></ul></ul><ul><ul><li>Continual monitoring of the organisation (audit) </li></ul></ul>
  7. 7. OAIS Reference Model (1) <ul><li>Reference Model for an Open Archival Information System (OAIS) </li></ul><ul><ul><li>ISO 14721:2003 Space data and information transfer systems -- Open archival information system -- Reference model </li></ul></ul><ul><ul><li>Defines: </li></ul></ul><ul><ul><ul><li>Common vocabulary (definitions of key concepts) </li></ul></ul></ul><ul><ul><ul><li>Information model (information packages, metadata, etc.) </li></ul></ul></ul><ul><ul><ul><li>Functional model (six functional entities) </li></ul></ul></ul><ul><ul><ul><li>Mandatory responsibilities </li></ul></ul></ul>
  8. 8. OAIS Reference Model (2) <ul><li>OAIS Mandatory Responsibilities: </li></ul><ul><ul><li>Negotiating and accepting information </li></ul></ul><ul><ul><li>Obtaining sufficient control of the information to ensure long-term preservation </li></ul></ul><ul><ul><li>Determining the &quot;designated community&quot; </li></ul></ul><ul><ul><li>Ensuring that information is independently understandable, i.e. can be (re)used without the assistance of those who produced it </li></ul></ul><ul><ul><li>Following documented policies and procedures </li></ul></ul><ul><ul><li>Making the preserved information available </li></ul></ul>
  9. 9. OAIS Reference Model (3) Administration Ingest Archival Storage Access Data Management Descriptive info. PRODUCER CONSUMER MANAGEMENT queries result sets Descriptive info. Preservation Planning orders OAIS Functional Entities (Figure 4-1) SIP SIP SIP DIP DIP AIP AIP
  10. 10. OAIS Reference Model (4) <ul><li>OAIS Information Model: </li></ul><ul><ul><li>Defines the “Information Packages” required </li></ul></ul><ul><ul><ul><li>Ingest (Submission Information Package) </li></ul></ul></ul><ul><ul><ul><li>Storage (Archival Information Package) </li></ul></ul></ul><ul><ul><ul><li>Access (Dissemination Information Package) </li></ul></ul></ul><ul><ul><li>General principle of Information Packages: </li></ul></ul><ul><ul><ul><li>All objects are wrapped in multiple layers of metadata (Representation Information, Descriptive Information, Packaging, etc.) </li></ul></ul></ul>
  11. 11. OAIS Reference Model (5) <ul><li>Implementation fundamentals: </li></ul><ul><ul><li>OAIS is a reference model (a conceptual framework), NOT a blueprint for system design </li></ul></ul><ul><ul><li>It informs the design of system architectures, the development of systems and components </li></ul></ul><ul><ul><li>It provides common definitions of terms … a common language, a means of making comparison </li></ul></ul><ul><ul><li>But it does NOT ensure consistency or interoperability between implementations </li></ul></ul><ul><ul><li>Conformance only relates to mandatory responsibilities and following the information model </li></ul></ul>
  12. 12. The DCC Lifecycle Model <ul><li>Digital Curation: </li></ul><ul><ul><li>“… The activity of, managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and re-use” (Lord & MacDonald, 2003) </li></ul></ul><ul><li>DCC Digital Curation Lifecycle Model: </li></ul><ul><ul><li>Focused on the entire lifecycle of objects (influenced by records management and archives thinking) from creation, through appraisal, ingest, storage, to access and reuse </li></ul></ul><ul><ul><li>Preservation activities at core of model … </li></ul></ul>
  13. 14. Digital preservation principles (1) <ul><li>Most of the technical problems associated with long-term digital preservation can be solved if a life-cycle management approach is adopted </li></ul><ul><ul><li>i.e. a continual programme of active management </li></ul></ul><ul><ul><li>Ideally, combines both managerial and technical processes, e.g., as in the OAIS Reference Model </li></ul></ul><ul><ul><li>Many current preservation systems are attempting to support this approach </li></ul></ul><ul><ul><li>Digital preservation strategies need to be seen in this wider context </li></ul></ul><ul><li>Wherever possible, retain also the original byte-stream </li></ul>
  14. 15. Digital preservation principles (2) <ul><li>Preservation needs to be considered at a very early stage in an object's life-cycle </li></ul><ul><li>There is a need to identify 'significant properties' </li></ul><ul><ul><li>Recognises that preservation is context dependent, even user specific (concept of 'designated community') </li></ul></ul><ul><ul><li>“ Performance” model (National Archives of Australia) </li></ul></ul><ul><ul><li>Helps with choosing an acceptable preservation strategy </li></ul></ul><ul><li>Encapsulation </li></ul><ul><ul><li>Surrounding the digital object - at least in theory - with all of the information needed to decode and understand it (including software) </li></ul></ul>
  15. 16. Digital preservation principles (3) <ul><li>Metadata and documentation is vitally important </li></ul><ul><ul><li>Relates to OAIS Information Model concepts like Representation Information and Preservation Description Information </li></ul></ul><ul><ul><li>Functions </li></ul></ul><ul><ul><ul><li>Records meaning </li></ul></ul></ul><ul><ul><ul><li>Records the context </li></ul></ul></ul><ul><ul><ul><li>Enables the development of finding aids </li></ul></ul></ul><ul><ul><li>Specific standards are being developed that support digital preservation activities (e.g., the PREMIS Data Dictionary) </li></ul></ul>
  16. 17. Digital preservation strategies <ul><li>Technology preservation </li></ul><ul><ul><li>Maintaining technology </li></ul></ul><ul><ul><ul><li>Computer museums, digital archaeology </li></ul></ul></ul><ul><li>Emulation </li></ul><ul><ul><li>Running original bit-streams and application software on emulator programs that mimic the behaviour of obsolete hardware and operating systems </li></ul></ul><ul><li>Migration </li></ul><ul><ul><li>Periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one </li></ul></ul>
  17. 18. Choosing a strategy (1) <ul><li>Preservation strategies are not in competition </li></ul><ul><ul><li>Different strategies will work together, may be value in diversification </li></ul></ul><ul><ul><li>Migration strategies mean difficult choices need to be made about target formats </li></ul></ul><ul><li>But the strategy chosen has implications for: </li></ul><ul><ul><li>The technical infrastructure required (and metadata) </li></ul></ul><ul><ul><li>Collection management priorities </li></ul></ul><ul><ul><li>Rights management </li></ul></ul><ul><ul><ul><li>Owning the rights to re-engineer software </li></ul></ul></ul><ul><ul><li>Costs </li></ul></ul>
  18. 19. Choosing a strategy (2) <ul><li>Plato preservation planning tool (EU Planets project) </li></ul><ul><ul><li>A decision support tool that helps users explore the evaluation of potential preservation solutions against specific requirements and for building a plan for preserving a given set of objects </li></ul></ul><ul><ul><li>Integrates file format identification (using DROID); some migration services; XML-based generic format characterisation using XCL (eXtensible Characterisation Languages) </li></ul></ul><ul><ul><li> </li></ul></ul>
  19. 20. Preservation support on ingest <ul><li>Formats can be identified and validated on ingest or deposit into a repository </li></ul><ul><ul><li>JHOVE (JSTOR/Harvard Object Validation Environment) </li></ul></ul><ul><ul><li>PRONOM, DROID (The National Archives) </li></ul></ul><ul><li>Metadata </li></ul><ul><ul><li>Some tools exist for the automatic capture of metadata </li></ul></ul><ul><li>Standardisation on ingest </li></ul><ul><ul><li>Received wisdom suggests the adoption of open or non-proprietary standards, e.g. databases structured in XML, uncompressed images, 'preservation friendly' standards like PDF/A </li></ul></ul>
  20. 21. Repository audit frameworks <ul><li>Repository audit frameworks first developed out of the OAIS Reference Model </li></ul><ul><ul><li>OAIS Mandatory Responsibilities (only six of them): </li></ul></ul><ul><ul><ul><li>The main focus was on technical and organisational aspects, e.g.: </li></ul></ul></ul><ul><ul><ul><ul><li>That repositories ensure that preserved information (content) can be understood (independently understandable) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>That documented policies and procedures are being followed </li></ul></ul></ul></ul><ul><ul><ul><li>No clear concept of OAIS compliance (although this is often claimed by system developers) </li></ul></ul></ul>
  21. 22. TRAC Criteria and Checklist (1) <ul><li>Trusted Repositories Audit and Certification (TRAC): Criteria and Checklist </li></ul><ul><ul><li>Background: </li></ul></ul><ul><ul><ul><li>Checklist developed by the RLG-NARA Digital Repository Certification Task Force </li></ul></ul></ul><ul><ul><ul><li>Revised (following pilot audits) by the Center for Research Libraries and OCLC </li></ul></ul></ul><ul><ul><ul><li>Based upon OAIS concepts </li></ul></ul></ul>
  22. 23. TRAC Criteria and Checklist (2) <ul><li>TRAC criteria cover three main aspects: </li></ul><ul><ul><li>Organisational Infrastructure </li></ul></ul><ul><ul><ul><li>Governance and viability, structure and staffing, financial sustainability, contracts, etc. </li></ul></ul></ul><ul><ul><li>Digital Object Management </li></ul></ul><ul><ul><ul><li>Ingest, preservation planning, archival storage, etc. </li></ul></ul></ul><ul><ul><li>Technologies, Technical Infrastructure, & Security </li></ul></ul><ul><ul><ul><li>Systems and infrastructure, etc. </li></ul></ul></ul>
  23. 24. TRAC Checklist example page
  24. 25. DRAMBORA <ul><li>DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) </li></ul><ul><ul><li>Digital Curation Centre / Digital Preservation Europe </li></ul></ul><ul><ul><li>“ Presents a methodology for self-assessment, encouraging organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organisation“ </li></ul></ul><ul><ul><li>Identifying risks and scoring each one on likelihood and impact </li></ul></ul><ul><ul><li>Covers: organisational context, policies, assets, risks, etc. </li></ul></ul><ul><ul><li>Online tool ( </li></ul></ul>
  25. 26. Repository audit frameworks <ul><li>A means of &quot;asking the right questions&quot; about your repository and documenting appropriate procedures and risks </li></ul><ul><li>Both TRAC and DRAMBORA are under consideration by (different) ISO technical committees </li></ul><ul><ul><li>External badge of quality (a &quot;certified preservation repository&quot;) </li></ul></ul><ul><ul><li>vs. </li></ul></ul><ul><ul><li>Management tool for self assessment </li></ul></ul>
  26. 27. Case study 1: E-mail preservation <ul><li>Electronic Mail </li></ul><ul><ul><li>Now ubiquitous in many business contexts </li></ul></ul><ul><ul><li>A mixture of records and other stuff </li></ul></ul><ul><ul><li>High-risk if not managed properly: </li></ul></ul><ul><ul><ul><li>Loss of accountability, efficiency, public credibility, organisational memory, etc. </li></ul></ul></ul><ul><ul><ul><li>There also may be legal and financial consequences </li></ul></ul></ul><ul><ul><li>An obvious candidate for the records management approach </li></ul></ul>
  27. 28. Some specific challenges of E-mail <ul><li>Inappropriate content </li></ul><ul><ul><li>For example: spam, personal messages, illegal content </li></ul></ul><ul><li>Wide range of attachment types – some will provide preservation challenges of their own </li></ul><ul><li>Unclear responsibilities: </li></ul><ul><ul><li>Users can be reluctant to ‘manage’ incoming mail </li></ul></ul><ul><ul><li>E-mail seen as personal domain, not as organisational property ... this can have consequences … </li></ul></ul>
  28. 30. &quot;All staff will be reminded of the appropriate use of Number 10 resources&quot; – Downing Street spokesperson
  29. 32. “ The unfortunate incident that has taken place through the illegal hacking of the private communications of individual scientists …” (Rajendra Pachauri, Chairman of the UN Intergovernmental Panel on Climate Change, statement, 4 Dec 2009, “ Since emails are normally intended to be private, people writing them are, shall we say, somewhat freer in expressing themselves than they would in a public statement” (RealClimate Web pages,
  30. 33. Approaches to managing e-mail <ul><li>Developing specific policies for managing email within an organisation </li></ul><ul><ul><li>Produce guidance for creators (and others) </li></ul></ul><ul><ul><li>Identify the chain of custody through lifecycle </li></ul></ul><ul><ul><li>Need to involve all people involved, e.g. creators, managers, records managers, IT staff, etc. </li></ul></ul><ul><li>Developing a preservation approach </li></ul><ul><ul><li>Appraisal - the identification of key e-mail content or records </li></ul></ul><ul><ul><li>Preservation strategies – the adoption of suitable strategies to deal with that content that needs to be retained </li></ul></ul>
  31. 34. E-mail policies (1) <ul><li>Policies need to cover: </li></ul><ul><ul><li>Creation practices </li></ul></ul><ul><ul><li>Using business e-mail accounts for private use & vice versa </li></ul></ul><ul><ul><li>Levels of organisational monitoring </li></ul></ul><ul><ul><li>Legal issues </li></ul></ul><ul><ul><li>Integrated records retention and preservation </li></ul></ul><ul><ul><li>Disposal </li></ul></ul>
  32. 35. E-mail policies (2) <ul><li>From: </li></ul>
  33. 36. E-mail preservation <ul><ul><li>Appraisal </li></ul></ul><ul><ul><ul><li>Determining what content needs to be preserved </li></ul></ul></ul><ul><ul><ul><li>Destruction of transient/unnecessary e-mails </li></ul></ul></ul><ul><ul><li>Saving e-mail records independently of the e-mail client </li></ul></ul><ul><ul><li>Check that content is complete - comprising message body, headers & attachments </li></ul></ul><ul><ul><li>Consider authenticity requirements </li></ul></ul><ul><ul><li>Ingest into an organisational EDRMS or repository </li></ul></ul><ul><ul><li>Make decisions on appropriate preservation strategies for content and attachments </li></ul></ul><ul><ul><ul><li>Selecting a standard format? </li></ul></ul></ul><ul><ul><ul><li>Significant properties? </li></ul></ul></ul>
  34. 37. Lost e-mails from the past <ul><li>The world’s very first network email </li></ul><ul><ul><li>Sent by Ray Tomlinson (BBN Technologies), late 1971 </li></ul></ul><ul><ul><li>A test message, probably something like “QWERTYUIOP” (documented, but not preserved – the contents were “entirely forgettable, and I have, therefore, forgotten them”) </li></ul></ul><ul><ul><li>First ‘real’ message explained to colleagues how to send messages over the network (exact text now unknown) </li></ul></ul><ul><ul><li>Probably no significant records management implications, but a key step in the historical development of the Internet was not recorded </li></ul></ul>
  35. 38. Case study 2: Preserving Websites <ul><li>Websites are ubiquitous: </li></ul><ul><ul><li>“ The Web has become the platform and interface of choice for virtually every kind of information system” (JISC-PoWR Handbook) </li></ul></ul><ul><ul><li>Typically run by IT staff (e.g., Web managers), main responsibilities relate to keeping systems online, stable and secure, and up-to-date … content is constantly evolving </li></ul></ul><ul><ul><li>Potential role for records managers to identify which parts of institutional Websites need to be incorporated within RM guidelines </li></ul></ul>
  36. 39. Preserving Websites (2) <ul><li>Things to consider: </li></ul><ul><ul><li>The identification / appraisal of Web records </li></ul></ul><ul><ul><li>Change frequency </li></ul></ul><ul><ul><li>Ownership and rights </li></ul></ul><ul><ul><li>Databases and the “deep Web” </li></ul></ul><ul><ul><li>The use of Content Management Systems (CMS) </li></ul></ul><ul><ul><li>Streamed content </li></ul></ul><ul><ul><li>The use of third-party sites </li></ul></ul><ul><ul><li>Personalisation / Web 2.0 / social networking </li></ul></ul>
  37. 40. Preserving Websites (3) <ul><li>Collection approaches: </li></ul><ul><ul><li>Various harvesting tools exist (e.g. Heritrix) </li></ul></ul><ul><ul><li>Domain harvesting, selective capture, periodic capture </li></ul></ul><ul><ul><li>Working with third parties – e.g.: </li></ul></ul><ul><ul><ul><li>European Archive ( </li></ul></ul></ul><ul><ul><ul><li>Internet Archive ( </li></ul></ul></ul><ul><li>Some examples of existing initiatives: </li></ul><ul><ul><li>UK Government Web Archive (TNA): </li></ul></ul><ul><ul><li>UK Web Archive (BL, JISC, Wellcome Library, NLW) </li></ul></ul>
  38. 41. Preserving Websites (4) <ul><li>Aspects of Websites that could be preserved: </li></ul><ul><ul><li>Information Content </li></ul></ul><ul><ul><li>Information Appearance </li></ul></ul><ul><ul><li>Information Behaviour </li></ul></ul><ul><ul><li>Information Relationships (e.g. links, embedded or linked metadata) </li></ul></ul><ul><ul><li>Change history </li></ul></ul><ul><ul><li>Use history </li></ul></ul><ul><ul><li>From: Kevin Ashley (ULCC), “The JISC-PoWR Handbook - Explaining Web Preservation,” via SlideShare: </li></ul></ul>
  39. 42. Questions?
  40. 43. Further reading (1) <ul><li>General: </li></ul><ul><ul><li>Abby Smith, &quot;The Research Library in the 21st Century: Collecting, Preserving, and Making Accessible Resources for Scholarship.&quot; In: No Brief Candle: Reconceiving Research Libraries for the 21st Century (CLIR, 2008), pp. 13-20. </li></ul></ul><ul><ul><li>Priscilla Caplan, Understanding PREMIS (Library of Congress, 2009): </li></ul></ul><ul><ul><li>Blue Riband Task Force on Sustainable Digital Preservation and Access, Sustainable economics for a digital planet (2010): </li></ul></ul><ul><ul><li>Paradigm Project Workbook: </li></ul></ul><ul><ul><li>Plato Preservation Planning tool: </li></ul></ul><ul><ul><li>DRAMBORA: </li></ul></ul>
  41. 44. Further reading (2) <ul><li>Preserving Emails: </li></ul><ul><ul><li>Maureen Pennock, “Curating E-mails,” In: DCC Curation Manual (2006): </li></ul></ul><ul><ul><li>The National Archives, Developing a policy for managing e-mail (2004): </li></ul></ul><ul><ul><li>Collaborative Electronic Records Project, Email records guidance (Smithsonian Institution Archives & Rockefeller Archives Center, 2007): </li></ul></ul>
  42. 45. Further reading (3) <ul><li>Preserving Websites: </li></ul><ul><ul><li>JISC-PoWR Handbook (Nov 2008): </li></ul></ul><ul><ul><li>JISC-PoWR blog: </li></ul></ul><ul><ul><li>The National Archives - Web Continuity project: </li></ul></ul><ul><ul><li>Adrian Brown, Archiving Websites: a practical guide for information management professionals (London: Facet Publishing, 2006) </li></ul></ul><ul><ul><li>Julien Masanès (ed.), Web Archiving (Berlin: Springer-Verlag, 2006) </li></ul></ul>
  43. 46. Acknowledgments <ul><li>UKOLN is funded by the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, the Museums, Libraries and Archives Council (MLA), as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. </li></ul><ul><li>More information: </li></ul>
  44. 47. Thank You!