Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Implementing Archivematica, research data network

Research data network, September 2016, Jenny Mitcham, University of York.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Implementing Archivematica, research data network

  1. 1. Implementing Archivematica for research data preservation at York and Hull Jenny Mitcham (Digital Archivist) - University of York Jisc RDN event - 06 September 2016
  2. 2. What I’m going to cover This is a presentation in 4 parts: 1. Background to our project 2. Implementing Archivematica 3. The challenges of preserving research data 4. Future plans
  3. 3. Part one: The Filling the Digital Preservation Gap project
  4. 4. Filling the digital preservation gap: Project aim “…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”
  5. 5. Project structure • Phase 1 – explore: testing, research, thinking -produce a report (3 months) • Phase 2 – develop: make Archivematica better for RDM, plan implementation - report (4 months) • Phase 3 – implement: set up proof of concepts at York and Hull and further investigation of file format problem (6 months)
  6. 6. The team University of Hull: • Chris Awre – Head of Information Services, Library and Learning Innovation • Richard Green – Independent Consultant • Simon Wilson – University Archivist University of York: • Julie Allinson – Manager, Digital York • Jen Mitcham – Digital Archivist Artefactual Systems Funded by Jisc (Research Data Spring)
  7. 7. Part two: Implementing Archivematica
  8. 8. What are we trying to achieve? Demonstrate that it is possible to: • pull metadata from PURE / pull content from Box • capture further data to help us manage the dataset • automatically initiate ingest by Archivematica • set up Archivematica to package the data up for longer term preservation (automatically) • provide a dissemination copy of the data for our Hydra repository ...basically what we said in our implementation plans
  9. 9. In addition… • Keep an eye on the broader picture – How can preservation processes for research data be used for other materials e.g., archives • Consider different use cases for research data organisation on deposit – Single file, multiple files, hierarchical files, etc. – With or without associated metadata • Share experiences across two institutions with different environments
  10. 10. How did we approach it? We wanted to work in a way that: • was useful to others • was open and accessible • had the bigger picture in mind So we are: • sharing code on github • working in google docs • engaging Hydra and Archivematica communities • blogging and talking at events like this
  11. 11. What does it look like? York
  12. 12. What does it look like? Hull
  13. 13. What were the challenges? • mostly time! – recruiting suitably skilled developer at short notice – relying on Artefactual Systems who have their own list of priorities and timescales – working with local IT department and different priorities • outstanding tasks from phase 2 which needed further development • integration/APIs (eg with PURE and Box)
  14. 14. What worked well? • Re-using existing code (rather than re- inventing the wheel) – The puree gem from Lancaster University: this is a way of pulling metadata out of PURE and it saved us a huge amount of work – Automation tools from Artefactual Systems: a lightweight method of automating transfers within Archivematica. We funded a webinar about this in phase 2 of our project. • Flexibility and capacity in house to do the work
  15. 15. Part three: The challenges of preserving files that we can’t identify
  16. 16. A quick look at file formats Research data file formats are: • Numerous • Sometimes a bit obscure • Sometimes very big • Ever-changing • Often very new This means they can be hard to preserve... The first hurdle is that we can’t identify them. If we can’t identify them how can we carry out preservation activities?
  17. 17. Research data applications in use at York
  18. 18. The NDSA Levels of Digital Preservation: Level 2 requires you to know what you’ve got ... and levels 3 and 4 build on this
  19. 19. Can we identify our research data? We ran Droid* over the research data deposited with us over the past year. Out of 3752 individual files: • only 37% (1382) of the files were identified (with varying degrees of accuracy) • there were 34 different identified file formats in the sample * Droid is a free tool from The National Archives that can be used to automatically identify file formats
  20. 20. Identified research data files Files identified by Droid (listed by file type) ...note that native files of the software in the previous graph of research data applications are not represented
  21. 21. Unidentified research data files • Files not identified by Droid (listed by file ext) • 107 different file extensions not identified – huge number with no extension (help!) – how do we solve the .dat file problem?
  22. 22. What is the project doing to solve the file identification problem? • We have sponsored the development of 8 new file format signature records in PRONOM for different types of research data • We have created our own research data file signatures for inclusion in PRONOM (and blogged about it to encourage others to do the same) • We have been talking to TNA about how to engage the community more
  23. 23. Part four: Future plans
  24. 24. Future plans • We have a week left to finish our active project work (eeeek!) • ...and look out for our phase 3 report in mid October (and other dissemination outputs) • We need to work out how to move from ‘proof of concept’ to production – York will be establishing how to move seamlessly from this project into the Jisc Shared Service –Hull will be using the work to inform a City of Culture digital archive
  25. 25. Where to find out more
  26. 26. Do talk to me if you are interested in finding out more about this project Useful links: Project website: Digital archiving blog: Archivematica: Phase 1 report Phase 2 report

    Be the first to comment

    Login to see the comments

Research data network, September 2016, Jenny Mitcham, University of York.


Total views


On Slideshare


From embeds


Number of embeds