Transforming repositories: from repository managers to institutional data managers


Published on

The last decade has seen support for digital preservation transformed. There are now a multitude of organisations, training courses, and software development tools to help guide managers of digital data towards preservation decisions and solutions. But how well do these approaches understand the needs and requirements of users? This presentation was given at ECA 2010, a conference for digital archiving professionals. But not everyone can be a digital archiving specialist. At a time of exploding volumes of digital content, especially on the Web, many non-specialists need help in preserving digital content. The presentation looks at the applicability and practicality of all this support for one class of user, digital repositories, and in particular institutional repositories (IRs) and their managers. We report on a course on digital preservation tools, designed by repository managers as part of the JISC KeepIt project. Positive feedback from the evaluations of this course have show that the emergence of the tools used in this course is a great story for digital preservation.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This is a large conference, with over 650 delegates we were told. Many will consider themselves to be digital archiving professionals. But not everyone can be a digital archiving specialist. At a time of exploding volumes of digital content, especially on the Web, many non-specialists need help in preserving digital content. This presentation is about how we can transfer the knowledge, skills and experience of the professional digital archivist to the non-specialist.
  • Two types of archiving are prominently represented in the literature: large specialist archives and personal digital archives, such as represented by the Swiss National Archives and BL’s Digital Lives project, respectively. I’m concerned with a group somewhere in between. Let’s call them repository managers. In other words, they are responsible for managing other people’s content. Typically these repositories are used to represent institutions, often large institutions such as universities.
  • For those who are unfamiliar with this type of institutional repository, or IR, they started to appear in 2000, and growth began to accelerate from 2004. Depending on which of these two registry services you prefer, there are now 1000-1500 IRs. Potentially there might be up to 10000 worldwide.
  • Here is a typical profile of one institutional repository from the Registry of Open Access Repositories (ROAR), which measures the content volumes and deposit rates of IRs. it also includes format profiles for many of these repositories. As can be seen from this example, content in IRs is characterised by a range of formats – many versions of PDF, with some html. In other words, mostly textual documents. These are likely to be copies of published research papers. There are also some images.
  • That’s what an IR looks like today. But what about in 5 years, say? Universities produce many types of content: science data, supporting the research papers, and teaching materials. Not all research is focussed on science. What about other areas, such as arts? It seems likely that the IR will become the dissemination means of choice for many, if not all, of the digital outputs of universities. Where is the diversified IR? We can’t find one yet, but we can find the components. In the JISC KeepIt project we are working with four different types of repository shown here – our exemplar preservation repositories – to synthesise what we believe could be the IR of the future and to investigate the preservation implications of the managing this range of content types. When we look ahead in terms of preservation, it’s likely we are thinking simply of preserving today’s content. Beyond that we need to think about what sort of content will be shaping the repository in the future.
  • The managers of these exemplar repositories are our non-specialists when it comes to digital archiving. They are responsible for all aspects of managing their repository, of which only one is archiving. We can provide support and services, but first it is necessary they are able to make preservation-related decisions and provide direction for their repositories. We wanted to focus on the repository managers more than on the technology, so we worked with them to design a training course. There are many good training courses on digital preservation, such as DPTP, which we heard about in the preceding presentation. These tend to be aimed at aspiring archiving professionals. The KeepIt course had some similarities, but also some significant differences with these courses, to reflect the difference in audience. Here is the first slide from the opening module of the course.
  • Here is how the course looked in full, from the corresponding opening slides of all modules: 5 modules, in total 6 days, spread over more than 2 months. We wanted the KeepIt course to focus on tools. The format was short presentations with lots of hands-on practical work, based on the role model for this mix established by the 101 Lite course by the Digital Curation Centre (DCC), e.g.
  • We wanted to attract the best tutors for the course. Those would be the developers of these tools where possible. To get the best tutors we couldn’t restrict participation to the repository managers of our 4 exemplar preservation repositories, so we offered an open invitation to others, limited to 15-20 people for practical training purposes. Although we had interest from around the world, the timescale of the course meant it would only be feasible for UK participants. We achieved the required level of take-up for the course and had passed the first test. Before I describe the tools we covered in the course, which has now completed, I want to give you some evidence of the feedback received from participants on the course.
  • The biggest test was whether the number of participants would hold up. The course was free, so they had plenty of scope to vote with their feet between modules. As you can see from this slide, the numbers held firm throughout.
  • It was clear we must be doing something right. Our course evaluations confirmed this. Participants liked the course structure, mixing presentation with practical.
  • We set clear objectives for the course, outlined in the original notice We achieved those objectives and those of the participants.
  • The first module looked at tools to provide the organisational context for repositories. These two tools – the Data Asset Framework (DAF) and Assessing Institutional Digital Assets (AIDA) – are essentially complementary, one looking inwardly at what types of content an institution may be producing, the other at how well an institution may support policy development. Remember, we are concerned with factors that may reshape the repository looking ahead.
  • Here is an indication from our participants on how well these tools serve their respective purposes. It must be noted that evaluations of tools shown through this presentation cannot usefully be compared across modules. The evaluation form was completed by participants at the conclusion of the final module (5), so the other modules and tools will see slightly smaller results. Separately, we will have to weigh whether module 1 shown here, for example, was disadvantaged by this approach.
  • How much does digital preservation cost? These two tools can help provide some quantitative perspective. Keeping Research Data Safe (KRDS) is a survey-based tool, while LIFE3 provides a spreadsheet-based format to calculate preservation costs using inputs for a range of relevant, weighted factors. So these tools use real examples and real data, and it turns out to cost a lot more than our repository managers had anticipated. It’s possible that some costs are over-estimated on current repository volumes, but costs will rise with growing content.
  • These tools achieved another strong set of evaluation results, with the same proviso as previously that evaluations of tools shown through this presentation cannot be usefully compared across modules because the evaluation form was only completed by participants at the conclusion of the final course module (5).
  • So we’ve looked at organisations and costs, but what about digital preservation? Isn’t it supposed to be more technical than this. By module 3 we were looking at describing digital content for preservation by identifying significant characteristics. We also used the familiar tool for preservation metadata, PREMIS, and briefly considered provenance through the emerging standard of the Open Provenance Model (OPM).
  • Module 3 was intended as a primer for our approach to preservation workflow. This workflow may be familiar because it follows the model from the PLANETS project, and has been covered elsewhere at ECA 2010. We have learned how to identify the characteristics of our content, and we have tools to determine formats and to migrate to other formats when needed. But how do we link the two, to know what to migrate and when?
  • One approach that also been described at ECA 2010 is the Plato preservation planning tool. In Plato you need to be able to identify the important characteristics of your digital objects; hence our coverage in module 3. In module 4 of our course we practiced using Plato, but there was a twist. Our content is in a repository. So we uploaded a preservation plan created with Plato to our repository, in this case based on EPrints repository software, and our repository enacted the plan for our (small amount of) test content (some scanned page images), producing the results shown top-left in this slide. In other words, our repository has become the interface to managing our preservation workflow.
  • As we have learned elsewhere, the key to successful tools and products is the interface, and it helps to build on familiar and successful interfaces.
  • I have been wary of dwelling too much on the evaluation results for particular tools used in the course, but it is noteworthy that the feedback for Plato was almost unanimously excellent.
  • Our final module is perhaps where many professional archivists start: trust. It is appropriate for non-specialists to work towards, and earn, trust first. A preservation repository has to be able to demonstrate that it can be trusted, and it has to be able to assess the trustworthiness of the services it uses.
  • We did not have a speaker with first-hand developer experience of TRAC, so our focus in this module was on the DRAMBORA tool, and participants liked its support for self-auditing, rather than third-party auditing provided by TRAC.
  • We promised to mix presentation and practical, and we did, with over 50% of course time devoted to practical work by the repository managers.
  • The essence of the course is nicely captured by our short Twitter archive. In particular this selection of comments provides a link to all the course slides on Slideshare.
  • We no longer need be scared of digital preservation. With all the positive feedback that the evaluations have shown us, even taking into account the provisos that we need to go further in simplifying tools for non specialists and reducing the number of different interfaces, this is an indication of real progress. The emergence of the tools used in this course is a great story for digital preservation.
  • Questions: Given the costs of running IRs and the costs of preservation, why should institutions commit to these costs when they could use subject repositories elsewhere? It’s true there are large subject repositories in physics (Cornell Arxiv), economics (indexed by RePEc) and biomedicine (BioMed Central) providing open access to published work in these areas, but across all academic disciplines there is a difficulty in approaching full, 100% open access. Research funders and institutions are recognising the importance of free and open access to this work and to bridge this gap are creating policies to require deposit. The only way they can do this is to base this policy on institutional requirements and commitments; they cannot demand this of other service providers. By requiring deposit and acquiring specified digital content in the institutional repository, it follows there is a responsibility to maintain and preserve access to that content. Off-mic question: Is the KeepIt course running again? Not within the project. Others such as DCC cover some tools.
  • Transforming repositories: from repository managers to institutional data managers

    1. 1. Transforming repositories: from repository managers to institutional data managers ECA 2010, 8th European Conference on Digital Archiving, Geneva, 28 - 30 April 2010 Steve Hitchcock, David Tarrant and Leslie Carr School of Electronics & Computer Science Steve Hitchcock <sh94r AT>
    2. 2. Not everyone can be a digital archiving specialist
    3. 3. Growth of Open Access Digital Repositories Generated from Registry of Open Access Repositories (ROAR) Generated from Directory of Open Access Repositories (OpenDOAR) Both charts generated 23 April 2010
    4. 4. ROAR repository format profile <ul><li>To access a format profile: </li></ul><ul><li>Find chosen repository in ROAR, open [Record Details] </li></ul><ul><li>Format profiles not available for all repositories in ROAR </li></ul><ul><li>ROAR disclaimer: Full-text formats is based on automatic </li></ul><ul><li>file-format identification and is prone to errors </li></ul>This profile for Australian Research Online repository From Registry of Open Access Repositories (ROAR)
    5. 5. Digital repositories diversifying: institution-wide outputs Science Teaching Research Arts KeepIt exemplar preservation repositories
    6. 6. Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 1, Organisational issues, audit, selection and appraisal School of ECS, University of Southampton, 19 January 2010 Twitter hashtag #dprc (digital preservation repository course)
    7. 7. Module 1, Organisational issues, audit, selection and appraisal School of ECS, University of Southampton, 19 January 2010 Module 3, Primer on preservation workflow, formats and characterisation Westminster-Kingsway College, London, 2 March 2010 Module 2, institutional and lifecycle preservation costs School of ECS, University of Southampton, 5 February 2010 Module 4, Putting storage, format management and preservation planning in the repository University of Southampton, 18-19 March 2010 Module 5, Trust, of the repository, of the tools and services it chooses University of Northampton, 30 March 2010
    8. 8. Course structure <ul><li>Module 1. Organisational issues Scoping, selection, assessment, institutional parameters (19 January 2010) </li></ul><ul><li>Module 2. Costs Lifecycle costs for managing digital objects, based on the LIFE approach, and institutional costs (5 February) </li></ul><ul><li>Module 3. Description Describing content for preservation: provenance, significant properties and preservation metadata (2 March) </li></ul><ul><li>Module 4. Preservation workflow tools available in EPrints for format management, risk assessment and storage, and linked to the Plato planning tool from Planets (17-18 March) </li></ul><ul><li>Module 5. Trust (by others) of the repository’s approach to preservation; trust (by the repository) of the tools and services it chooses (30th March) </li></ul>
    9. 9. KeepIt course participant numbers <ul><li>jisckeepit  KeepIt course 3: thanks as well to all participants: </li></ul><ul><li>16 for course 1 (from 11 institutions), 15 (from 11) yesterday. Great commitment </li></ul><ul><li>03 Mar 2010 </li></ul><ul><li>jisckeepit  KeepIt course: &quot;did you really think it would only be you left by the last module&quot;. Yes, but I was wrong. </li></ul><ul><li>Course 1, 16; course 5, 16 </li></ul><ul><li>31 Mar 2010 </li></ul><ul><li>Source: Twitter @jisckeepit </li></ul>
    10. 10. Evaluation: course structure “ Structure and development through the course was excellent. Presentations and practicals gave good introductions to all the tools. Applicability sometimes focussed too much on IRs and research data”
    11. 11. Course evaluation summary “ Many of these tools are, of necessity, complex in scope and time consuming. The challenge is to understand which one to use in which situation and to what depth to engage with it. ”
    12. 12. Tools module 1 <ul><li>The Data Asset Framework (DAF), Sarah Jones , University of Glasgow, and Harry Gibbs , University of Southampton </li></ul><ul><li>The AIDA toolkit: Assessing Institutional Digital Assets, Ed Pinsent , University of London Computer Centre </li></ul>
    13. 13. Evaluation module 1
    14. 14. Tools module 2 <ul><li>Keeping Research Data Safe (KRDS), Costs, Policy, and Benefits in Long-term Digital Preservation , Neil Beagrie , Charles Beagrie Ltd consultancy </li></ul><ul><li>LIFE 3 : Predicting Long Term Preservation Costs, Brian Hole , The British Library </li></ul>
    15. 15. Evaluation module 2 “ Impressed with LIFE3 tool. I hope it is further developed. I like the way it works and can provide, comparatively quickly, some indication of likely costs. Useful and practical”
    16. 16. Tools module 3 <ul><li>Significant characteristics, Stephen Grace and Gareth Knight , Kings College London </li></ul><ul><li>PREMIS, Open Provenance Model </li></ul>
    17. 17. Analyse Check Action <ul><li>Migration </li></ul><ul><li>Emulation </li></ul><ul><li>Storage selection </li></ul><ul><li>Format identification, versioning </li></ul><ul><li>File validation </li></ul><ul><li>Virus check </li></ul><ul><li>Bit checking and checksum calculation </li></ul><ul><li>Tools </li></ul><ul><li>e.g. DROID </li></ul><ul><li>JHOVE </li></ul><ul><li>FITS </li></ul>Preservation planning Characterisation: Significant properties and technical characteristics, provenance , format, risk factors Risk analysis Tools Plato (Planets) PRONOM (TNA) P2 risk registry (KeepIt) INFORM (U Illinois) KB Preservation workflow
    18. 18. Tools module 4 <ul><li>EPrints preservation apps, including the storage controller, Dave Tarrant and Adam Field , University of Southampton </li></ul><ul><li>Plato, preservation planning tool from the Planets project, Andreas Rauber and Hannes Kulovits , TU Wien </li></ul>
    19. 19. Steve Jobs launches Apple iPad Picture by curiouslee “ 75 million people already own iPod Touches and iPhones. That's all people who already know how to use the iPad. ”
    20. 20. Evaluation module 4 “ This part of the course made me appreciate how big the area of preservation is and also the amount of research undertaken in this area.”
    21. 21. Tools module 5 <ul><li>DRAMBORA , Digital Repository Audit Method Based On Risk Assessment, Martin Donnelly , Digital Curation Centre, University of Edinburgh </li></ul>TRAC, Trusted Repository Audit and Certification : criteria and checklist
    22. 22. Evaluation: DRAMBORA “ We will definitely be investigating DRAMBORA further”
    23. 23. KeepIt course time
    24. 24. KeepIt course summary in tweets <ul><li>jisckeepit KeepIt course 1, result 1: senior directors at Northampton U. support use of DAF Mon, 08 Feb 2010 </li></ul><ul><li>digitalfay uploaded my first file to the cloud using #eprints next stop: comprehensive bitstream preservation policies for repository content Thu, 18 Mar 2010 </li></ul><ul><li>digitalfay very impressed with end-to-end logical preservation process #eprints3.2 (risk audit) to #planetsway (planning) & back again (action) Thu, 18 Mar 2010 </li></ul><ul><li>jisckeepit KeepIt course 4: practical-make a preservation plan in Plato, upload it to EPrints and it enacts the plan on your collection. Magic! Mon, 22 Mar 2010  </li></ul><ul><li>jisckeepit KeepIt course: There's now a substantial group of repository managers out there ready and able to apply appropriate preservation tools Wed, 31 Mar 2010 </li></ul><ul><li>clairemparry : @jisckeepit absolutely - thanks to all the tutors & organisers for a fantastic course which made all the long train journeys worth it Tue, 06 Apr 2010 </li></ul><ul><li>jisckeepit  KeepIt course 5: revision, evaluation and concluding thoughts - the last hurrah. Complete course slides now at Thu, 08 Apr 2010  </li></ul><ul><li>Selected tweets from Twapperkeeper for #dprc </li></ul>
    25. 25. Lessons from the KeepIt course <ul><li>The digital preservation community has produced an array of tools covering most requirements. </li></ul><ul><li>Repository managers have responded positively to practice with these tools. </li></ul><ul><li>Repository managers need to act to shape their repositories for the next phase of development: expansion; diversification or focus. </li></ul><ul><li>These tools support this process, as well as the technical management of digital preservation. </li></ul><ul><li>Still need to reduce complexity and make tools simpler for non-specialists. </li></ul><ul><li>One approach is to integrate tools into familiar interfaces, such as repositories. </li></ul><ul><li>This is a great story for digital preservation </li></ul>
    26. 26. Credits <ul><li>KeepIt team at the University of Southampton </li></ul><ul><li>Les Carr , PI, Steve Hitchcock , project manager, David Tarrant , developer </li></ul><ul><li>KeepIt preservation exemplar repositories are led by: </li></ul><ul><li>Simon Coles (eCrystals, University of Southampton) </li></ul><ul><li>Stephanie Meece (University of the Arts London) </li></ul><ul><li>Debra Morris (EdShare, University of Southampton) </li></ul><ul><li>Miggie Pickton (Nectar, University of Northampton) </li></ul><ul><li>Thanks to all KeepIt course tutors and tutees </li></ul><ul><li>KeepIt is funded by JISC (to Sept. 2010) as part of its Information Environment Programme 2009-11 </li></ul>