Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Scholars Portal, a program of the Ontario Council of University Libraries (OCUL), provides the technical infrastructure to store, preserve, and provide access to shared digital library collections in Ontario - including hosting a local instance of Dataverse since 2011. As part of a national project known as Portage (a project of the Canadian Association of Research Libraries), Scholars Portal is partnering with Artefactual Systems, Dataverse, the University of British Columbia, the University of Alberta, and others, to integrate Dataverse with preservation software Archivematica. When completed, this project will facilitate the long-term preservation of research data according to the Open Archival Information System (OAIS) Reference Model.

  • Be the first to comment

Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

  1. 1. Preservation of Research Data: Dataverse / Archivematica Integration Allan Bell | Associate University Librarian, The University of British Columbia Leanne Trimble | Data & Geospatial Librarian, OCUL Scholars Portal
  2. 2. The UBC Context
  3. 3. University of British Columbia Digital Preservation Strategy ● Digital Preservation Program ○ cIRcle, DSpace-based repository ○ Digitized collections in CONTENTdm ○ New and legacy born digital archival material ○ Websites (Archive-IT) ○ Soon, Abacus Dataverse, Research Data
  4. 4. University of British Columbia Digital Preservation Strategy ● Use Archivematica as a tool to apply OAIS-compliant preservation processes ● Integrate Archivematica with existing systems used to manage digital objects ● Build internal technical and staff capacity
  5. 5. OAIS reference model
  6. 6. Archivematica ● “a free and open source digital preservation system that is designed to maintain standards- based, long term access to collections of digital objects” ● micro-services provide integrated suite of software tools in compliance with ISO- OAIS model
  7. 7. Digital Preservation Program CiRcle (Dspace) • Archivematica receives submissions from DSpace • Also have Archivematica to DSpace workflow
  8. 8. Digital Preservation Program CONTENTdm • Master files uploaded to Archivematica • Archivematica produces access versions and pushes to CONTENTdm
  9. 9. Digital Preservation Program RBSC/UA born-digital acquisition workflow
  10. 10. Digital Preservation Program TRAC Self Audit • Trustworthy Repositories Audit and Certification (evolved into ISO 16363) • Widely accepted criteria for assessing trustworthiness of digital repositories • TRAC checklist is an auditing tool to assess the reliability, commitment and readiness of institutions to assume long-term preservation responsibilities
  11. 11. What is TRAC? • The TRAC metrics assess three areas: a. Organizational Infrastructure - the repository's administrative, staffing, financial, and legal functions b. Digital Object Management - the handling of digital objects from ingest to access c. Technology, Technical Infrastructure and Security - the technology used to handle ingested objects • These criteria represent best practices and current thinking about the organizational and technological needs of trustworthy digital repositories.
  12. 12. TRAC Compliant Repositories Centre for Research Libraries has audited and certified five repositories: •Chronopolis Report •CLOCKSS •Hathitrust Report •Portico Report •Scholars Portal
  13. 13. Digital Preservation Program Conclusions • Greater comfort with and understanding of the challenges around archiving digitized and born digital material • Establishing a comprehensive digital preservation program is complex! • Having tools is important, also need policies and procedures for certification (if desired)
  14. 14. Abacus Dataverse: Research Data Management ● UBC hosted instance for four Research Universities in British Columbia since 2014 ○ Abacus DSpace launched in 2009 ● 1,700 studies (more than 28,000 files) ● Actively used by researchers ● Each school has full control and added discoverability for their data ○ Licensed data but also growing institutional research data collections ○ Each institution has its own subnet with ■ OAI export to Summon (common Library Discovery Layer) ■ Separate Dataverses for institutional research data
  15. 15. The Ontario Context
  16. 16. OCUL & Scholars Portal Who? • 21 university libraries in Ontario What? • Collective purchasing • Shared digital infrastructure • Collaborative planning and assessment How? *Scholars Portal* • OCUL’s shared technology infrastructure, housing shared collections More information:
  17. 17. OCUL/SP & Research Data Management Dataverse (OCUL hosted instance) – Hosted for OCUL since 2011 – 330 studies (about 4,000 files) – Actively used by researchers from 7-8 institutions – Many in social science disciplines but some in sciences (agriculture, polar research, geophysics, nursing…)
  18. 18. OCUL/SP & Research Data Management • Services are evolving at each institution • Still trying to get a handle on: – RDM support services required by researchers – RDM infrastructure requirements – RDM costs – Role of regional consortia in RDM services
  19. 19. OCUL/SP & Digital Preservation • Trustworthy Digital Repository (TDR) certified for electronic journal content (since 2013) • Currently working on Ontario Library Research Cloud (OLRC) project (2015 completion) •Data Preservation: strong interest
  20. 20. National initiatives in Canada
  21. 21. ‘Portage’ Canadian Association of Research Libraries led project aimed at building a library-based research data management network 2 aspects: • Network of expertise for research data management • A national preservation and discovery network for research data
  22. 22. National preservation network
  23. 23. Dataverse / Archivematica Integration
  24. 24. Dataverse/Archivematica Integration Dataverse • Data • Metadata (DDI & other) Archivematica • Accept data and metadata • Perform preservation functions • Create Archival Information Packages (AIPs) Archival storage ? Local Data Repository (e.g. at SP or UBC) Preservation Infrastructure (Portage) Integration Middleware • Harvest content via Dataverse API (no SWORD client capability ATM) • Package and submit to Archivematica using SWORD
  25. 25. Project Participants • Artefactual – Evelyn McLellan, Justin Simpson • Dataverse – Phil Durbin, Eleni Castro (& others) • Scholars Portal – Leanne Trimble, Alan Darnell • UBC – Allan Bell, Eugene Barsky • University of Alberta – Geoff Harder, Chuck Humphrey, Larry Laliberte, Peter Binkley • Simon Fraser University – Alex Garnett
  26. 26. Functional Requirements ● Develop “middleware” which can transfer studies from Dataverse to Archivematica - Detect newly published studies & “major” new versions - Harvest released studies from Dataverse - Utilize SWORD protocol - Submit to Archivematica - One Dataverse study = 1 SIP = 1 AIP
  27. 27. Functional Requirements (2) ● Investigate Archivematica pipeline decisions for data formats coming from Dataverse - File format normalization? - Connecting versions of the same dataset to one another? - Handling DDI (and other) metadata records?
  28. 28. Possible features for future stages • Dataverse as a SWORD client • Mechanism within Dataverse for researchers to specify which datasets they want to target for preservation • Returning information from Archivematica back to Dataverse (indication of preservation status within Dataverse)
  29. 29. Next Steps • University of Toronto procurement process underway to contract the development work to Artefactual • Develop the middleware (2015) • Recruit researchers to contribute data to ingest (concurrent with development work)