CESSDA Persistent Identifiers


Published on

Introduction of the bridge between DataverseNL data repository for ongoing research and EASY trusted digital repository on
workshop PID Information Types for the Social Sciences.

Published in: Science
  1. 1. DANS is een instituut van KNAW en NWO CESSDA Persistent Identifiers Workshop PID Information Types for the Social Sciences May 29, 2017, The Hague Vyacheslav Tykhonov Senior Information Scientist (DANS)
  2. 2. DANS is een instituut van KNAW en NWO DANS data repositories with Persistent Identifiers Within the context of DANS’ mission, it is obligatory that every (digital) object archived via DANS has a PID, so that it can be (re)located and cited. DANS uses PIDs for both (digital) objects and people. DataverseNL for ongoing research projects • every dataset has its own handle (for Dutch Universities) • revisions of dataset don’t change the handle, every new version changing only citation EASY for permanent data archiving (DOIs) • archived dataset has DOI • every version of dataset archived from DataverseNL producing new DOI
  3. 3. DANS is een instituut van KNAW en NWO • DANS has developed Plugin to archive datasets deposited in Dataverse temporary storage to Trusted Digital Repositories (TDR) • Before putting datasets in the long term archive users should create account in TDR and get proper permissions to archive their data • Archival Plugin is open source software and can be easily extended by support of any TDRs:
  4. 4. DANS is een instituut van KNAW en NWO “Archive” button is available for local Dataverse administrators to push datasets to EASY archive for long term preservation
  5. 5. DANS is een instituut van KNAW en NWO Administrator can make choice where to archive the dataset: Archivematica, Islandora, FEDORA or DANS EASY (EASY is default option)
  6. 6. DANS is een instituut van KNAW en NWO Archiving process will run in background to extract data and metadata from dataset and will create archived (bagit) package containing all files and checksums
  7. 7. DANS is een instituut van KNAW en NWO After process of archiving will be finished button “Archive” will disappear on the page. Dataset citation will be extended with DOI pointing to archived version of the dataset in EASY
  8. 8. DANS is een instituut van KNAW en NWO Archived version of the dataset is available on EASY landing page and can be cited in research papers
  9. 9. DANS is een instituut van KNAW en NWO Archived dataset automatically will get DOI and URN pointing to archived revision (version) of dataset
  10. 10. DANS is een instituut van KNAW en NWO All files from dataset will get permission levels corresponding to versions of files stored in Dataverse
  11. 11. Dataverse as Archival Service • We’re working on the extension of Dataverse with DOIs generated for every version of dataset to make it work as permanent storage • Citations can contain duplicate metadata but dataset content (data files) should be different • Archival part can be hosted by the same Dataverse depending from plugin settings
  12. 12. CESSDA PID plugin • Universal plugin to get DOIs and handles in the same Dataverse instance • Prefix of every organisation will be generated based on the configuration and authentication settings of the plugin • switch Dataverse between support of ongoing research and archive (in separate subdataverses)
  13. 13. Challenges • We need PID “Proxy” Service collecting information about all DOIs generated for different versions of datasets with handles • depending from the location and status of dataset every citation should contain handle (Netherlands), URN:NBN (Europe) and DOI (worldwide) • statistics about all citations of datasets in research papers should be aggregated and provided as part of “Proxy” Service to build own “PageRank” index • Big Data and Linked Open Data archiving with Persistent Identifiers • higher level of granularity for separate files, subsets, fragments, time services to make citation more accurate • tombstone pages maintenance
  14. 14. Big Data repository with Persistent Identifiers The approach is suitable for product development companies (industry) and organisations and institutions (CESSDA) looking for sustainable (Big) data archiving services. Big Data object in Dataverse consists of: • metadata with authorship and citation information • data usage licence • persistent DOI or handle • information how to obtain key (API token) to start use API endpoint(s) • link to API endpoint delivering data • representation of API (interactive documentation, Swagger) • data provenance • controlled vocabularies to meet domain specific community standards (optional) Public demonstration is available on Dataverse demo website.
  15. 15. Linked Data hubs as archived object Source: PID object
  16. 16. DANS is een instituut van KNAW en NWO Questions?