Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digital Preservation


Published on

Presentation slides for an introductory lecture given at the University of the West of England on the 15th February 2010

  • Be the first to comment

Digital Preservation

  1. 1. UKOLN is supported by: Digital Preservation Michael Day Research and Development Team Leader UKOLN, University of Bath Information Systems and Services, UWE, Bristol, 15 February 2011
  2. 2. Presentation outline <ul><li>Digital preservation overview </li></ul><ul><ul><li>Some definitions </li></ul></ul><ul><ul><li>Technical challenges </li></ul></ul><ul><ul><li>Organisational challenges </li></ul></ul><ul><li>Approaches to solving the problem </li></ul><ul><ul><li>Preservation Strategies </li></ul></ul><ul><ul><li>Tools for: </li></ul></ul><ul><ul><ul><li>Format characterisation (DROID) </li></ul></ul></ul><ul><ul><ul><li>Preservation Planning (Plato) </li></ul></ul></ul><ul><ul><li>The OAIS model: </li></ul></ul><ul><ul><ul><li>Preservation metadata </li></ul></ul></ul><ul><ul><ul><li>Repository audit frameworks (TRAC, DRAMBORA) </li></ul></ul></ul>
  3. 3. Definitions <ul><li>Digital preservation: </li></ul><ul><ul><li>Is mainly concerned with the sustainability of “content” for a given period of time (not forever) </li></ul></ul><ul><ul><li>Largely about ensuring “continued access” to content </li></ul></ul><ul><ul><li>“ The series of managed activities necessary to ensure continued access to digital materials for as long as necessary” - Digital Preservation Coalition (DPC) Digital Preservation Definitions and Concepts list: </li></ul></ul><ul><ul><li>A combination of technical, organisational and legal challenges </li></ul></ul>
  4. 4. Digital preservation basics <ul><li>An ongoing (lifecycle) approach to managing digital content based on: </li></ul><ul><ul><li>The identification and adoption of appropriate preservation strategies for content </li></ul></ul><ul><ul><li>The collection and management of appropriate metadata (explicit and implicit knowledge, contexts) </li></ul></ul><ul><ul><li>The ongoing monitoring of technical contexts and the application of preservation planning techniques </li></ul></ul><ul><ul><li>Continual monitoring of the organisation (audit) </li></ul></ul>
  5. 5. A multi-faceted set of challenges <ul><li>Technical </li></ul><ul><ul><li>Strategies needed to deal with ongoing obsolescence and scale </li></ul></ul><ul><li>Organisational </li></ul><ul><ul><li>Access and reuse </li></ul></ul><ul><ul><li>Authenticity and integrity </li></ul></ul><ul><ul><li>Sustainability (costs) </li></ul></ul><ul><ul><li>Legal (see Andrew Charlesworth’s lecture) </li></ul></ul>
  6. 6. Technical challenges (1) <ul><li>Physical </li></ul><ul><ul><li>Bits stored on a physical medium (or in the cloud?) </li></ul></ul><ul><ul><li>Focus 20 years ago was on new media types (e.g. optical storage technologies) as a panacea </li></ul></ul><ul><ul><li>Bit-level preservation is still important – the first layer in a viable preservation strategy </li></ul></ul>
  7. 7. Obsolete media <ul><li>Image courtesy of Frank Carey </li></ul>Exhibition at NASA White Sands Test Facility, 2009
  8. 8. Technical challenges (2) <ul><li>Hardware and software dependence </li></ul><ul><ul><li>Most digital objects are dependent on particular configurations of hardware and software </li></ul></ul><ul><ul><li>Relatively short obsolescence cycles </li></ul></ul>
  9. 9. Hardware and software dependence Exhibition at NASA White Sands Test Facility, 2009 Image courtesy of Frank Carey
  10. 10. Conceptual challenges (1) <ul><li>What is an digital object? </li></ul><ul><ul><li>Some are analogues of traditional objects, e.g. meeting minutes, research papers </li></ul></ul><ul><ul><li>Others are not, e.g. Web pages, blogs, GIS, 3D models of chemical structures, research data more generally </li></ul></ul><ul><ul><ul><li>Complexity </li></ul></ul></ul><ul><ul><ul><li>Dynamic nature </li></ul></ul></ul><ul><ul><ul><li>Interactivity </li></ul></ul></ul><ul><ul><li>Born digital vs. product of digitisation initiatives </li></ul></ul><ul><ul><li>Logical layer between physical storage of bits and the conceptual objects that need preservation (includes data types, formats, etc.) </li></ul></ul>
  11. 11. Conceptual challenges (2) <ul><li>Need to identify and document the “significant properties” (or characteristics) of content: </li></ul><ul><ul><li>Recognises that preservation is context dependent, even user specific (OAIS concept of 'designated community') </li></ul></ul><ul><ul><li>Helps with choosing an acceptable preservation strategy </li></ul></ul><ul><ul><ul><li>Compare the ‘performance model’ developed by the National Archives of Australia (2002) - “The source of a record is a fixed message that interacts with technology. This message provides the record’s unique meaning, but by itself is meaningless to researchers since it needs to be combined with technology in order to be rendered as its creator intended. The process is the technology required to render meaning from the source” </li></ul></ul></ul><ul><ul><li>Focus on re-use (data curation) </li></ul></ul>
  12. 12. Organisational challenges (1) <ul><li>Sustainability: </li></ul><ul><ul><li>Ultimately the sustainability of content depends upon the long-term sustainability of organisations </li></ul></ul><ul><ul><ul><li>Focus on business models </li></ul></ul></ul><ul><ul><li>Organisational commitment: </li></ul></ul><ul><ul><ul><li>“ An institutional repository needs to be a service with continuity behind it … Institutions need to recognise that they are making commitments for the long term” Clifford Lynch </li></ul></ul></ul><ul><ul><ul><li>Need for policy development </li></ul></ul></ul><ul><ul><li>Incentives for preservation: </li></ul></ul><ul><ul><ul><li>Clarity on roles and responsibilities needed </li></ul></ul></ul><ul><ul><ul><li>Who benefits? Who pays? “Free riding?” </li></ul></ul></ul>
  13. 13. Organisational challenges (2) <ul><li>Economic perspectives: </li></ul><ul><ul><li>Blue Ribbon Task Force on Sustainable Digital Preservation and Access: </li></ul></ul><ul><ul><ul><li>Final report (Feb 2010) “Ensuring that valuable digital assets will be available for future use is not simply a matter of finding sufficient funds. It is about mobilizing resources - human, technical, and financial - across a spectrum of stakeholders diffuse over both space and time. But questions remain about what digital information we should preserve, who is responsible for preserving, and who will pay.” </li></ul></ul></ul><ul><ul><li>JISC-funded LIFE (Life Cycle Information for E-Literature) has developed a predictive costing tool: </li></ul></ul>
  14. 14. Organisational challenges (3) <ul><li>The challenge of scale: </li></ul><ul><ul><li>The Web </li></ul></ul><ul><ul><li>Digitised content: </li></ul></ul><ul><ul><ul><li>Google Books </li></ul></ul></ul><ul><ul><li>The “data deluge” in e-Science: </li></ul></ul><ul><ul><ul><li>New generations of instruments, computer simulations </li></ul></ul></ul><ul><ul><ul><li>Many terabytes generated per day, petabyte scale computing (and growing) </li></ul></ul></ul><ul><ul><ul><li>Cory Doctorow, “Welcome to the petacentre.” Nature , 455, pp 17-21, 4 Sep 2008 </li></ul></ul></ul>
  15. 15. Organisational challenges (4) <ul><li>The need for collaboration: </li></ul><ul><ul><li>Need for 'deep-infrastructure' for preservation recognised as far back as 1996 by the Task Force on Archiving of Digital Information </li></ul></ul><ul><ul><ul><li>Digital preservation involves the &quot;grander problem of organizing ourselves over time and as a society ... [to manoeuvre] effectively in a digital landscape&quot; (p. 7) </li></ul></ul></ul><ul><ul><li>Building on existing networks </li></ul></ul><ul><ul><li>Role for national-level co-ordination: </li></ul></ul><ul><ul><ul><li>Digital Preservation Coalition (DPC), nestor (Germany), National Digital Information Infrastructure and Preservation Program (NDIIPP) </li></ul></ul></ul>
  16. 16. Organisational challenges (5) <ul><li>Learn the lessons from the past: </li></ul><ul><ul><li>Things will go wrong </li></ul></ul><ul><ul><li>Do what you can to enable recovery from disaster </li></ul></ul><ul><ul><li>Digital technologies support replication (create more than one point of failure) </li></ul></ul>
  17. 17. Digital preservation strategies (1) <ul><li>Main approaches: </li></ul><ul><ul><li>Technology preservation (e.g., computing museums) </li></ul></ul><ul><ul><li>Digital archaeology (a post hoc approach) </li></ul></ul><ul><ul><li>Emulation (focusing on the environment, often used where look-and-feel is important, e.g. computer games) </li></ul></ul><ul><ul><li>Migration (focusing on the content) </li></ul></ul><ul><ul><ul><li>A mature approach: A set of organised tasks designed to achieve the periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996) </li></ul></ul></ul>
  18. 18. Digital preservation strategies (2) <ul><li>Preservation strategies are not in competition </li></ul><ul><ul><li>Different strategies will work together, may be value in diversification </li></ul></ul><ul><ul><li>Migration strategies mean difficult choices need to be made about target formats </li></ul></ul><ul><li>But the strategy chosen has implications for: </li></ul><ul><ul><li>The technical infrastructure required (and metadata) </li></ul></ul><ul><ul><li>Collection management priorities </li></ul></ul><ul><ul><li>Rights management </li></ul></ul><ul><ul><ul><li>Owning the rights to re-engineer software </li></ul></ul></ul><ul><ul><li>Costs </li></ul></ul>
  19. 19. Digital preservation strategies (3) <ul><li>Tools for format characterisation and validation </li></ul><ul><ul><li>DROID - Digital Record Object Identification (based on the PRONOM registry </li></ul></ul><ul><ul><ul><li>Very important to know what types (formats) of content exist in a particular collection (e.g., institutional repository or Web archive) </li></ul></ul></ul><ul><ul><ul><li>Performs batch identification of file formats </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><li>JHOVE - JSTOR/Harvard Object Validation Environment </li></ul></ul><ul><ul><ul><li>Used for format validation </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>
  20. 20. Digital preservation strategies (4) <ul><li>Plato preservation planning tool </li></ul><ul><ul><li>Developed by EU Planets project </li></ul></ul><ul><ul><li>A decision support tool that helps users explore the evaluation of potential preservation solutions against specific requirements and for building a plan for preserving a given set of objects </li></ul></ul><ul><ul><li>Integrates file format identification (using DROID); some migration services; XML-based generic format characterisation using XCL (eXtensible Characterisation Languages) </li></ul></ul><ul><ul><li>More info: </li></ul></ul><ul><ul><li>Integration with repositories tested by JISC KeepIt project: </li></ul></ul>
  21. 21. The OAIS Reference Model OAIS Functional Entities (Figure 4-1)
  22. 22. Preservation metadata <ul><li>Metadata and documentation is vitally important </li></ul><ul><ul><li>Relates to OAIS concepts like Representation Information and Preservation Description Information </li></ul></ul><ul><ul><li>Functions: </li></ul></ul><ul><ul><ul><li>Enables resource discovery - supports the development of finding aids </li></ul></ul></ul><ul><ul><ul><li>Records meaning (structure and semantics) </li></ul></ul></ul><ul><ul><ul><li>Records context and provenance (authenticity) </li></ul></ul></ul><ul><ul><li>Standards that support digital preservation activities are under development: </li></ul></ul><ul><ul><ul><li>PREMIS Data Dictionary (for core metadata): </li></ul></ul></ul>
  23. 23. Repository audit frameworks (1) <ul><li>Repository audit frameworks first developed out of the OAIS Reference Model </li></ul><ul><ul><li>OAIS Mandatory Responsibilities (only six of them): </li></ul></ul><ul><ul><ul><li>The main focus was on technical and organisational aspects, e.g.: </li></ul></ul></ul><ul><ul><ul><ul><li>That repositories ensure that preserved information (content) can be understood (independently understandable) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>That documented policies and procedures are being followed </li></ul></ul></ul></ul><ul><ul><ul><li>No clear concept of OAIS “compliance” </li></ul></ul></ul>
  24. 24. Repository audit frameworks (2) <ul><li>Trusted Repositories Audit and Certification (TRAC): Criteria and Checklist: </li></ul><ul><ul><li>Source: </li></ul></ul><ul><ul><li>RLG-NARA Digital Repository Certification Task Force checklist, revised by the Center for Research Libraries (CRL) and OCLC </li></ul></ul><ul><ul><li>Criteria cover three main things: </li></ul></ul><ul><ul><ul><li>Organisational Infrastructure </li></ul></ul></ul><ul><ul><ul><ul><li>Governance and viability, structure and staffing, financial sustainability, contracts, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Digital Object Management </li></ul></ul></ul><ul><ul><ul><ul><li>Ingest, preservation planning, archival storage, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Technologies, Technical Infrastructure, & Security </li></ul></ul></ul><ul><ul><ul><ul><li>Systems and infrastructure, etc. </li></ul></ul></ul></ul>
  25. 25. Core repository principles (1) <ul><li>Ten Principles - agreed 2007 by CRL (US), Digital Curation Centre (UK), Nestor (Germany) and Digital Preservation Europe </li></ul><ul><ul><li>The repository commits to continuing maintenance of digital objects for identified community/communities. </li></ul></ul><ul><ul><li>Demonstrates organizational fitness (including financial, staffing structure, and processes) to fulfill its commitment. </li></ul></ul><ul><ul><li>Acquires and maintains requisite contractual and legal rights and fulfills responsibilities. </li></ul></ul><ul><ul><li>Has an effective and efficient policy framework. </li></ul></ul><ul><ul><li>Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities. </li></ul></ul>
  26. 26. Core repository principles (2) <ul><li>Ten principles (continued) </li></ul><ul><ul><li>Maintains/ensures the integrity, authenticity and usability of digital objects it holds over time. </li></ul></ul><ul><ul><li>Creates and maintains requisite metadata about actions taken on digital objects during preservation as well as about the relevant production, access support, and usage process contexts before preservation. </li></ul></ul><ul><ul><li>Fulfills requisite dissemination requirements. </li></ul></ul><ul><ul><li>Has a strategic program for preservation planning and action. </li></ul></ul><ul><ul><li>Has technical infrastructure adequate to continuing maintenance and security of its digital objects. </li></ul></ul><ul><li>Available: </li></ul>
  27. 27. TRAC Checklist example page
  28. 28. Repository audit frameworks (3) <ul><li>DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) </li></ul><ul><ul><li>Digital Curation Centre / Digital Preservation Europe </li></ul></ul><ul><ul><li>“ Presents a methodology for self-assessment, encouraging organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organisation“ </li></ul></ul><ul><ul><li>Identifying risks and scoring each one on likelihood and impact </li></ul></ul><ul><ul><li>Covers: organisational context, policies, assets, risks, etc. </li></ul></ul><ul><ul><li>Online tool ( </li></ul></ul>
  29. 29. Repository audit frameworks (4) <ul><li>A means of &quot;asking the right questions&quot; about repositories and documenting appropriate procedures and risks </li></ul><ul><li>Both TRAC and DRAMBORA are under consideration by ISO technical committees </li></ul><ul><ul><li>External badge of quality (a &quot;certified preservation repository&quot;) </li></ul></ul><ul><ul><li>or </li></ul></ul><ul><ul><li>Management tool for self assessment </li></ul></ul>
  30. 30. Digital preservation basics (reprise) <ul><li>An ongoing (lifecycle) approach to managing digital content based on: </li></ul><ul><ul><li>The identification and adoption of appropriate preservation strategies for content </li></ul></ul><ul><ul><li>The collection and management of appropriate metadata (explicit and implicit knowledge, contexts) </li></ul></ul><ul><ul><li>The ongoing monitoring of technical contexts and the application of preservation planning techniques </li></ul></ul><ul><ul><li>Continual monitoring of the organisation (audit) </li></ul></ul>
  31. 31. The Future ... <ul><li>“It is always a mistake for a historian to try and predict the future. Life, unlike science, is simply too full of surprises” - Richard J. Evans, In defence of history (1997, p. 62) </li></ul>
  32. 32. Web links: <ul><li>PRESERV project: </li></ul><ul><li>KeepIt project: </li></ul><ul><li>Plato Preservation Planning tool: </li></ul><ul><li>DRAMBORA: </li></ul><ul><li>RSP briefing paper on preservation and storage formats: </li></ul><ul><li>WePreserve cartoons at: </li></ul>
  33. 33. <ul><li>Available: </li></ul>
  34. 34. Further reading <ul><li>Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Final Report (NSF, 2010) </li></ul><ul><li>Digital Preservation Coalition, Digital preservation handbook: </li></ul><ul><li>JISC infoNet, Digital repositories infoKit: </li></ul><ul><li>Paradigm Project, Workbook on Digital Private Papers: </li></ul><ul><li>Marieke Guy, JISC Beginner’s Guide to Digital Preservation (UKOLN, 2010) </li></ul><ul><li>Digital Preservation Coalition and Digital Curation Centre, What’s New (monthly current awareness bulletin): </li></ul>
  35. 35. Further reading (research data) <ul><li>National Science Board, Long-lived digital data collections: enabling research and education in the 21st century (NSF, 2005) http// </li></ul><ul><li>Liz Lyon, Dealing with data; roles, rights, responsibilities and relationships (JISC, 2007) </li></ul><ul><li>Neil Beagrie, Jullia Chruszcz, and Brian Lavoie, Keeping research data safe: a cost model and guidance for UK universities (JISC, 2008) </li></ul><ul><li>Neil Beagrie, Brian Lavoie and Matthew Woollard, Keeping research data safe 2 (JISC, 2010) </li></ul>
  36. 36. Questions?
  37. 37. Acknowledgments <ul><li>UKOLN is funded by the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, the Museums, Libraries and Archives Council (MLA), as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. </li></ul><ul><li>More information: </li></ul>
  38. 38. Thank you!