Digital Preservation


Published on

Presentation for U

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Image from:
  • Image courtesy of Frank Carey:
  • Image courtesy of Frank Carey:
  • Reference: Thibodeau, K. (2002)."Overview of technological approaches to digital preservation and challenges in coming years." In: The state of digital preservation: an international perspective . Washington, D.C.: Council for Library and Information Resources. Available:
  • National Archives of Australia, An Approach to the Preservation of Digital Records (2002):
  • Image from Mary Beard’s blog:
  • References:
  • Digital Preservation

    1. 1. UKOLN is supported by: Digital Preservation Michael Day Digital Curation Centre UKOLN, University of Bath [email_address] Information Systems and Services, UWE, Bristol, 29 February 2012
    2. 2. Presentation outline <ul><li>Digital preservation overview </li></ul><ul><ul><li>Some definitions </li></ul></ul><ul><ul><li>Technical challenges </li></ul></ul><ul><ul><li>Organisational challenges </li></ul></ul><ul><li>Approaches to solving the problem </li></ul><ul><ul><li>Preservation Strategies </li></ul></ul><ul><ul><li>Tools for: </li></ul></ul><ul><ul><ul><li>Format characterisation (DROID) </li></ul></ul></ul><ul><ul><ul><li>Preservation Planning (Plato) </li></ul></ul></ul><ul><ul><li>The OAIS model: </li></ul></ul><ul><ul><ul><li>Preservation metadata </li></ul></ul></ul><ul><ul><ul><li>Repository audit frameworks (TRAC, DRAMBORA) </li></ul></ul></ul><ul><ul><ul><li>Institutional assessment tools: (DAF, CARDIO) </li></ul></ul></ul>
    3. 3. Definitions <ul><li>Digital preservation: </li></ul><ul><ul><li>Is mainly concerned with the sustainability of “content” for a given period of time (probably not forever) </li></ul></ul><ul><ul><li>Largely about ensuring “continued access” to content </li></ul></ul><ul><ul><li>“ The series of managed activities necessary to ensure continued access to digital materials for as long as necessary” - Digital Preservation Coalition (DPC) Digital Preservation Definitions and Concepts list: </li></ul></ul><ul><ul><li>A combination of technical, organisational and legal challenges </li></ul></ul>
    4. 4. Digital preservation basics <ul><li>An ongoing (lifecycle) approach to managing digital content based on: </li></ul><ul><ul><li>The identification and adoption of appropriate preservation strategies for content </li></ul></ul><ul><ul><li>The collection and management of appropriate metadata (explicit and implicit knowledge, contexts) </li></ul></ul><ul><ul><li>The ongoing monitoring of technical contexts and the application of preservation planning techniques </li></ul></ul><ul><ul><li>Continual monitoring of the organisation (audit) </li></ul></ul>
    5. 5. A multi-faceted set of challenges <ul><li>Technical </li></ul><ul><ul><li>Strategies needed to deal with ongoing obsolescence and scale </li></ul></ul><ul><li>Organisational </li></ul><ul><ul><li>Access and reuse </li></ul></ul><ul><ul><li>Authenticity and integrity </li></ul></ul><ul><ul><li>Sustainability (costs) </li></ul></ul><ul><ul><li>Legal </li></ul></ul>
    6. 6. Technical challenges (1) <ul><li>Physical </li></ul><ul><ul><li>Bits stored on a physical medium (or in the cloud?) </li></ul></ul><ul><ul><li>Focus 20 years ago was on new media types (e.g. optical storage technologies) as a panacea </li></ul></ul><ul><ul><li>Bit-level preservation is still important – the first layer in a viable preservation strategy </li></ul></ul>
    7. 7. Obsolete media <ul><li>Image courtesy of Frank Carey </li></ul>Exhibition at NASA White Sands Test Facility, 2009
    8. 8. Technical challenges (2) <ul><li>Hardware and software dependence </li></ul><ul><ul><li>Most digital objects are dependent on particular configurations of hardware and software </li></ul></ul><ul><ul><li>Relatively short obsolescence cycles </li></ul></ul>
    9. 9. Hardware and software dependence Exhibition at NASA White Sands Test Facility, 2009 Image courtesy of Frank Carey
    10. 10. Conceptual challenges (1) <ul><li>What is an digital object? </li></ul><ul><ul><li>Some are analogues of traditional objects, e.g. meeting minutes, research papers </li></ul></ul><ul><ul><li>Others are not, e.g. Web pages, blogs, GIS, 3D models of chemical structures, research data more generally </li></ul></ul><ul><ul><ul><li>Complexity </li></ul></ul></ul><ul><ul><ul><li>Dynamic nature </li></ul></ul></ul><ul><ul><ul><li>Interactivity </li></ul></ul></ul><ul><ul><li>Born digital vs. product of digitisation initiatives </li></ul></ul><ul><ul><li>Logical layer between physical storage of bits and the conceptual objects that need preservation (includes data types, formats, etc.) </li></ul></ul>
    11. 11. Conceptual challenges (2) <ul><li>Need to identify and document the “significant properties” (or characteristics) of content: </li></ul><ul><ul><li>Recognises that preservation is context dependent, even user specific (OAIS concept of 'designated community') </li></ul></ul><ul><ul><li>Helps with choosing an acceptable preservation strategy </li></ul></ul><ul><ul><ul><li>Compare the ‘performance model’ developed by the National Archives of Australia (2002) - “The source of a record is a fixed message that interacts with technology. This message provides the record’s unique meaning, but by itself is meaningless to researchers since it needs to be combined with technology in order to be rendered as its creator intended. The process is the technology required to render meaning from the source” </li></ul></ul></ul><ul><ul><li>Focus on re-use (e.g., data curation) </li></ul></ul>
    12. 12. Organisational challenges (1) <ul><li>Sustainability: </li></ul><ul><ul><li>Ultimately the sustainability of content depends upon the long-term sustainability of organisations </li></ul></ul><ul><ul><ul><li>Focus on business models </li></ul></ul></ul><ul><ul><ul><li>Embedding preservation into the core task of organisations </li></ul></ul></ul><ul><ul><li>Organisational commitment: </li></ul></ul><ul><ul><ul><li>“ An institutional repository needs to be a service with continuity behind it … Institutions need to recognise that they are making commitments for the long term” Clifford Lynch </li></ul></ul></ul><ul><ul><ul><li>Need for policy development </li></ul></ul></ul><ul><ul><li>Incentives for preservation: </li></ul></ul><ul><ul><ul><li>Clarity on roles and responsibilities needed </li></ul></ul></ul><ul><ul><ul><li>Who benefits? Who pays? “Free riding?” </li></ul></ul></ul>
    13. 13. Organisational challenges (2) <ul><li>Economic perspectives: </li></ul><ul><ul><li>Blue Ribbon Task Force on Sustainable Digital Preservation and Access: </li></ul></ul><ul><ul><ul><li>Final report (Feb 2010) “Ensuring that valuable digital assets will be available for future use is not simply a matter of finding sufficient funds. It is about mobilizing resources - human, technical, and financial - across a spectrum of stakeholders diffuse over both space and time. But questions remain about what digital information we should preserve, who is responsible for preserving, and who will pay.” </li></ul></ul></ul><ul><ul><li>JISC-funded LIFE (Life Cycle Information for E-Literature) has developed a predictive costing tool: </li></ul></ul>
    14. 14. Organisational challenges (3) <ul><li>The challenge of scale: </li></ul><ul><ul><li>The Web </li></ul></ul><ul><ul><li>Digitised “textual” content: </li></ul></ul><ul><ul><ul><li>Google Books </li></ul></ul></ul><ul><ul><li>The “data deluge” in e-Science: </li></ul></ul><ul><ul><ul><li>New generations of instruments, computer simulations </li></ul></ul></ul><ul><ul><ul><li>Many terabytes generated per day, petabyte scale computing (and growing) </li></ul></ul></ul><ul><ul><ul><li>Cory Doctorow, “Welcome to the petacentre.” Nature , 455, pp 17-21, 4 Sep 2008 </li></ul></ul></ul>
    15. 15. Organisational challenges (4) <ul><li>The need for collaboration: </li></ul><ul><ul><li>Need for 'deep-infrastructure' for preservation recognised as far back as 1996 by the Task Force on Archiving of Digital Information </li></ul></ul><ul><ul><ul><li>Digital preservation involves the &quot;grander problem of organizing ourselves over time and as a society ... [to manoeuvre] effectively in a digital landscape&quot; (p. 7) </li></ul></ul></ul><ul><ul><li>Building on existing networks </li></ul></ul><ul><ul><li>Role for national-level co-ordination: </li></ul></ul><ul><ul><ul><li>Digital Preservation Coalition (DPC), nestor (Germany), National Digital Information Infrastructure and Preservation Program (NDIIPP) </li></ul></ul></ul>
    16. 16. Organisational challenges (5) <ul><li>Learn the lessons from the past: </li></ul><ul><ul><li>Things will go wrong </li></ul></ul><ul><ul><li>Do what you can to enable recovery from disaster </li></ul></ul><ul><ul><li>Digital technologies support replication (create more than one point of failure) </li></ul></ul>
    17. 17. Digital preservation strategies (1) <ul><li>Main approaches: </li></ul><ul><ul><li>Technology preservation (e.g., computing museums) </li></ul></ul><ul><ul><li>Digital archaeology (a post hoc approach) </li></ul></ul><ul><ul><li>Emulation (focusing on the environment, often used where look-and-feel is important, e.g. computer games) </li></ul></ul><ul><ul><li>Migration (focusing on the content) </li></ul></ul><ul><ul><ul><li>A mature approach: A set of organised tasks designed to achieve the periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996) </li></ul></ul></ul>
    18. 18. Digital preservation strategies (2) <ul><li>Preservation strategies are not in competition </li></ul><ul><ul><li>Different strategies will work together, may be value in diversification </li></ul></ul><ul><ul><li>Migration strategies mean difficult choices need to be made about target formats </li></ul></ul><ul><li>But the strategy chosen has implications for: </li></ul><ul><ul><li>The technical infrastructure required (and metadata) </li></ul></ul><ul><ul><li>Collection management priorities </li></ul></ul><ul><ul><li>Rights management </li></ul></ul><ul><ul><ul><li>Owning the rights to re-engineer software </li></ul></ul></ul><ul><ul><li>Costs </li></ul></ul>
    19. 19. Digital preservation strategies (3) <ul><li>Tools for format characterisation and validation </li></ul><ul><ul><li>DROID - Digital Record Object Identification (based on the PRONOM registry </li></ul></ul><ul><ul><ul><li>Very important to know what types (formats) of content exist in a particular collection (e.g., institutional repository or Web archive) </li></ul></ul></ul><ul><ul><ul><li>Performs batch identification of file formats </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><li>JHOVE - JSTOR/Harvard Object Validation Environment </li></ul></ul><ul><ul><ul><li>Used for format validation </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>
    20. 20. Digital preservation strategies (4) <ul><li>Plato preservation planning tool </li></ul><ul><ul><li>Developed by EU Planets project </li></ul></ul><ul><ul><li>A decision support tool that helps users explore the evaluation of potential preservation solutions against specific requirements and for building a plan for preserving a given set of objects </li></ul></ul><ul><ul><li>Integrates file format identification (using DROID); some migration services; XML-based generic format characterisation using XCL (eXtensible Characterisation Languages) </li></ul></ul><ul><ul><li>More info: </li></ul></ul><ul><ul><li>Integration with repositories tested by JISC KeepIt project: </li></ul></ul>
    21. 21. The OAIS Reference Model OAIS Functional Entities (Figure 4-1)
    22. 22. Preservation metadata <ul><li>Metadata and documentation is vitally important </li></ul><ul><ul><li>Relates to OAIS concepts like Representation Information and Preservation Description Information </li></ul></ul><ul><ul><li>Functions: </li></ul></ul><ul><ul><ul><li>Enables resource discovery - supports the development of finding aids </li></ul></ul></ul><ul><ul><ul><li>Records meaning (structure and semantics) </li></ul></ul></ul><ul><ul><ul><li>Records context and provenance (authenticity) </li></ul></ul></ul><ul><ul><li>Standards that support digital preservation activities are under development: </li></ul></ul><ul><ul><ul><li>PREMIS Data Dictionary (for core metadata): </li></ul></ul></ul>
    23. 23. Repository audit frameworks (1) <ul><li>Repository audit frameworks first developed out of the OAIS Reference Model </li></ul><ul><ul><li>OAIS Mandatory Responsibilities (only six of them): </li></ul></ul><ul><ul><ul><li>The main focus was on technical and organisational aspects, e.g.: </li></ul></ul></ul><ul><ul><ul><ul><li>That repositories ensure that preserved information (content) can be understood (independently understandable) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>That documented policies and procedures are being followed </li></ul></ul></ul></ul><ul><ul><ul><li>No clear concept of OAIS “compliance” </li></ul></ul></ul>
    24. 24. Repository audit frameworks (2) <ul><li>Trusted Repositories Audit and Certification (TRAC): Criteria and Checklist: </li></ul><ul><ul><li>Source: </li></ul></ul><ul><ul><li>Criteria cover three main things: </li></ul></ul><ul><ul><ul><li>Organisational Infrastructure </li></ul></ul></ul><ul><ul><ul><ul><li>Governance and viability, structure and staffing, financial sustainability, contracts, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Digital Object Management </li></ul></ul></ul><ul><ul><ul><ul><li>Ingest, preservation planning, archival storage, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Technologies, Technical Infrastructure, & Security </li></ul></ul></ul><ul><ul><ul><ul><li>Systems and infrastructure, etc. </li></ul></ul></ul></ul>
    25. 25. Core repository principles (1) <ul><li>Ten Principles - agreed 2007 by CRL (US), Digital Curation Centre (UK), Nestor (Germany) and Digital Preservation Europe </li></ul><ul><ul><li>The repository commits to continuing maintenance of digital objects for identified community/communities. </li></ul></ul><ul><ul><li>Demonstrates organizational fitness (including financial, staffing structure, and processes) to fulfill its commitment. </li></ul></ul><ul><ul><li>Acquires and maintains requisite contractual and legal rights and fulfills responsibilities. </li></ul></ul><ul><ul><li>Has an effective and efficient policy framework. </li></ul></ul><ul><ul><li>Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities. </li></ul></ul>
    26. 26. Core repository principles (2) <ul><li>Ten principles (continued) </li></ul><ul><ul><li>Maintains/ensures the integrity, authenticity and usability of digital objects it holds over time. </li></ul></ul><ul><ul><li>Creates and maintains requisite metadata about actions taken on digital objects during preservation as well as about the relevant production, access support, and usage process contexts before preservation. </li></ul></ul><ul><ul><li>Fulfills requisite dissemination requirements. </li></ul></ul><ul><ul><li>Has a strategic program for preservation planning and action. </li></ul></ul><ul><ul><li>Has technical infrastructure adequate to continuing maintenance and security of its digital objects. </li></ul></ul><ul><li>Available: </li></ul>
    27. 27. TRAC Checklist example page
    28. 28. Repository audit frameworks (3) <ul><li>DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) </li></ul><ul><ul><li>Developed by the Digital Curation Centre and Digital Preservation Europe </li></ul></ul><ul><ul><li>“ Presents a methodology for self-assessment, encouraging organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organisation“ </li></ul></ul><ul><ul><li>Identifying risks and scoring each one on likelihood and impact </li></ul></ul><ul><ul><li>Covers: organisational context, policies, assets, risks, etc. </li></ul></ul><ul><ul><li>Online tool: </li></ul></ul>
    29. 29. Repository audit frameworks (5) <ul><li>A means of &quot;asking the right questions&quot; about repositories (and the wider organisation) and documenting appropriate procedures and risks </li></ul><ul><li>More than one role: </li></ul><ul><ul><li>External badge of quality (a &quot;certified preservation repository&quot;) </li></ul></ul><ul><ul><li>Management tool for self assessment </li></ul></ul>
    30. 30. DCC institutional assessment tools <ul><li>Data Asset Framework: </li></ul><ul><ul><li>Analysing institutional requirements and holdings </li></ul></ul><ul><ul><li>Discover out what data exists, where it is stored, formats, metadata, etc. </li></ul></ul><ul><li>CARDIO (Collaborative Assessment of Research Data Infrastructure): </li></ul><ul><ul><li>Evaluating data management requirements, activity, and capacity </li></ul></ul><ul><ul><li>Building consensus between data creators, information managers and service providers </li></ul></ul><ul><ul><li>Identifying practical goals for improvement in data management provision and support; </li></ul></ul><ul><ul><li>identifying operational inefficiencies and potential opportunities for cost saving; </li></ul></ul><ul><ul><li>Making a case to senior managers for investment in data management support </li></ul></ul>
    31. 31. Digital preservation basics (reprise) <ul><li>An ongoing (lifecycle) approach to managing digital content based on: </li></ul><ul><ul><li>The identification and adoption of appropriate preservation strategies for content </li></ul></ul><ul><ul><li>The collection and management of appropriate metadata (explicit and implicit knowledge, contexts) </li></ul></ul><ul><ul><li>The ongoing monitoring of technical contexts and the application of preservation planning techniques </li></ul></ul><ul><ul><li>Continual monitoring of the organisation (audit) </li></ul></ul>
    32. 32. “ It is always a mistake for a historian to try and predict the future. Life, unlike science, is simply too full of surprises” - Richard J. Evans, In defence of history (1997, p. 62)
    33. 33. Web links: <ul><ul><li>PRESERV project: </li></ul></ul><ul><ul><li>KeepIt project: </li></ul></ul><ul><ul><li>Plato Preservation Planning tool: </li></ul></ul><ul><ul><li>RSP briefing paper on preservation and storage formats: </li></ul></ul><ul><ul><li>WePreserve cartoons at: </li></ul></ul>
    34. 34. <ul><li>Available: </li></ul>
    35. 35. Further reading <ul><ul><li>Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Final Report (NSF, 2010) </li></ul></ul><ul><ul><li>Digital Preservation Coalition, Digital preservation handbook: </li></ul></ul><ul><ul><li>JISC infoNet, Digital repositories infoKit: </li></ul></ul><ul><ul><li>Paradigm Project, Workbook on Digital Private Papers: </li></ul></ul><ul><ul><li>Marieke Guy, JISC Beginner’s Guide to Digital Preservation (UKOLN, 2010) </li></ul></ul><ul><ul><li>Digital Preservation Coalition and Digital Curation Centre, What’s New (monthly current awareness bulletin): </li></ul></ul>
    36. 36. Further reading (research data) <ul><ul><li>National Science Board, Long-lived digital data collections: enabling research and education in the 21st century (NSF, 2005) http// </li></ul></ul><ul><ul><li>Liz Lyon, Dealing with data; roles, rights, responsibilities and relationships (JISC, 2007) </li></ul></ul><ul><ul><li>Neil Beagrie, Jullia Chruszcz, and Brian Lavoie, Keeping research data safe: a cost model and guidance for UK universities (JISC, 2008) </li></ul></ul><ul><ul><li>Neil Beagrie, Brian Lavoie and Matthew Woollard, Keeping research data safe 2 (JISC, 2010) </li></ul></ul>
    37. 37. Questions?
    38. 38. Acknowledgments <ul><li>The Digital Curation Centre (DCC) is a world-leading centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community. The DCC is funded by JISC. </li></ul><ul><li>More information is available from: </li></ul><ul><li>UKOLN receives support from JISC and the University of Bath, where it is based. </li></ul><ul><li>More information is available from: </li></ul>
    39. 39. Thank you!