Persistent identifiers in DataverseEU project

dans.knaw.nl
DANS is een instituut van KNAW en NWO
PIDs in CESSDA DataverseEU
Vyacheslav Tykhonov
Senior Information Scientist (DANS),
DataverseEU lead developer
CESSDA PID workshop,
20.03.2018

DataverseEU development model
• We’re not going to create new fork of Dataverse, our
contributions should go to the master branch hosted by
IQSS at Harvard
• Delivered as Docker images and deployed in Google Cloud
as CESSDA Dataverse repository
• Any Service Provider can host separate Dataverse instance
in its own cloud if it’s required
• Metadata from other CESSDA repositories will be harvested
by central DataverseEU repository
• Easy to add new languages if more partners will join during
or after the project

DataverseEU tasks overview
• Development of multilingual web interface (German,
Slovenian, Swedish, Hungarian, Italian, French, Spanish)
• Support of localized metadata model corresponding to
CESSDA CMM
• Design and development of PID plugin mechanism to allow
service providers to choose PID service of their preference
• Development of APIs for CESSDA CVs and Topic
Classification based on CESSDA CV Manager services

Multilingual web interface: Spanish

PID structure in Dataverse
Every PID contains:
• Prefix: unique authority (ID of institution or
organization)
• Separator
• Sequence of characters or numbers for dataset ID
identification
Examples:
<PID> ::= <Naming Authority> "/" <Handle Local Name>
doi:10.4232/1.0001 (DOI)
hdl:10411/KL0X8C (handle)

Dataverse PID Plugin requirements
• We need flexible way to switch between PID service
providers (da|ra, DataCite, handle)
• Registering DOIs with da|ra will give data providers a
greater visibility and recognition as data references will be
integrated in da|ra search index
• Different data archives can get separate prefixes within the
same Dataverse instance and increase their visibility and
recognition
• PID Plugin can be used in combination with external storage
configuration (based on Swift) to host data locally in national
infrastructures

Current implementation of PID service
• out-of-the-box support of DOIs (DataCite) and handles
(handle.net)
• one Dataverse instance can be only bundled or to DOI, or to
handle
• there is no possibility to use own prefixes for different
organisations (DataverseNL is hdl:10411 for all partners)
• switching between DOIs and handles can be done by
executing API requests:
curl -X PUT -d hdl "http://localhost:8080/api/admin/settings/:Protocol"
curl -X PUT -d 10411 “http://localhost:8080/api/admin/settings/:Authority”
curl -X PUT -d doi "http://localhost:8080/api/admin/settings/:Protocol"
curl -X PUT -d 10.5072/FK2 "http://localhost:8080/api/admin/settings/:Authority"

The Dataset Lifecycle
Destroyed Updated URL
create
update
destroy
Published versions can
be de-accessed at any
time.
Unpublished versions
(drafts) can also be
deleted.
publish update
Credits: Felix Bensmann (GESIS). Supporting New PID Providers in Dataverse

PID assigning strategies
• Different PID for every new version of a dataset (da|ra)
• The same PID for the dataset, shared by all versions (DOI,
handle)
The central idea of PID plugin: every service provider can
choose the strategy for assigning PIDs that will fit the best to
their needs.
Warning: different communities need various strategies!

PID strategies based on community needs
• Sharing data via the Archive
Dataset files aren’t changing, PIDs are different
• Research Data deposit
(Ph.D. students): obligation to make data of thesis or
study publicly available, work in progress, PID is the same

The same PID for all versions of a dataset:
Example on difference between versions
In the same time there is no support for granularity, for example:
https://dataverse.nl/dataset.xhtml?persistentId=hdl:10411/KL0X8C&version=2.0
is not https://dataverse.nl/dataset.xhtml?persistentId=hdl:10411/KL0X8C.2

PID Plugin features
• Developed by GESIS and designed as extra module that can
be added to running Dataverse application
• Functionality provided by the PID Plugin triggered by
events:
• Creation (onCreate)
• Update (onUpdate)
• Publication (onPublish)
• Deaccession
• Destruction
• Lookup (lookupDoi, getProviderName)
• All settings are controlled via Dataverse API (suffix)

PID Plugin registration with da|ra
• Own XML schema
• Mandatory and optional fields
• Every dataset update will create new PID
• Metadata schema of service providers
should be synchronized with da|ra schema
<xml>
<metadata>
<ID>ABCDE</ID>
<version>v1.0</version>
<doi>
auth.ority/DV/ABCDE
</doi>
<title>title</title>
<url>https://…</url>
</metadata>
</xml>

Persistent identifiers in DataverseEU project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Persistent identifiers in DataverseEU project

Similar to Persistent identifiers in DataverseEU project (20)

More from vty

More from vty (20)

Recently uploaded

Recently uploaded (20)

Persistent identifiers in DataverseEU project