"Open data repository for scientific data sharing with the southern countries" was the subject of the talk given by Jean-Chsritophe Desconnets, head of the IRD's Infrastructure and Digital Data Mission (MIDN), in Gabarone (Botswana) on 2018, november 8th during the International Data Week. It presents the IRD's data repository project that will open in 2019. This project is co-managed by MIDN, IT and IST Services.
4. sharing
new papers
new citations
reuse
IdentificationDescriptionDiscovery
Storage
deposit
Current practises in research community
science data life cycle
4
Research project
design
Start of project
Data
acquisition
Data analysis
scientific paper(s)
End of project
data cycle during the project
data cycle in data repository
project bounds memory lapse
data destruction
add value of research work
Data
acquisition
Research project
design
Start of project Data analysis end of project
Data
management
plan
scientific paper(s)
data cycle during the project
data cycle in data repository
project bounds
add value of research work
5. 5
IRD data repository objectives
First piece for the ecosystem « data management for open
science »
Short term objective (2019‐2020)
Provide a service (platform + support + curation) to researchers to control
the dissemination of their data and their preservation
Mid‐term objective (2020‐2025)
Ensure the discovery of data archived in other repositories, data centres
or research infrastructures (directory function)
6. 6
Internally
Responding to the national plan for open science
Improve the knowledge and management of datasets
A first "concrete" element towards an open science policy at the IRD
Specific objectives
Institutional issues
For our southern partners
Define and control the data governance instead of private repositrories
Improve the accessibility of our data to Southern partners
Support open science initiatives in the South (replication of the data
repository, capacity building)
Europe and international level
Meeting the requirements of European programmes
Integrate into European EOSC (European Open Science Cloud)
infrastructures
7. 7
Data repository scope
Targeted data
Unstructured, undigitised data,
stored in PCs, historical data not
linked to internally or externally
accessible databases
Observatories, data
centres, online
databases
Statistical distribution of research data
(Ferguson et al., 2014)
8. 8
Coming from various scientific domains with differents characteristics
Genomics
Exploited marine ecosystems
Marine and agro biodiversity
Health
Environment sciences
Targeted data and needs
With various needs and expectations from researchers
Data rescue, data preservation
Reproducibility of experiments
DOI allocation, data papers
Data Sharing over the data producers
9. •Limit to data discovery and access facilities
•Make available FAIR data
•Support metadata harvesting from OAI‐PMH protocol
9
Principles for design
Metadata core
model
(Dublin Core)
Standard
Identification
system (DOI,
ORCID…)
Domain
categorie
s
Harvest
Control
metadata value
Use
Extend
Specific metadata
domain standard
Spatial
location
Species
taxonomies
extended discovery facilities interoperability
10. •Digital Object identifier Allocation for each dataset
• Flexibility to describe a dataset (enrich core metadata model with
metadata standard coming from a specific domain)
• Possibility to put data management in the hands of researchers:
each data folder can be managed by a different administrator
• Publication workflow which allows
• Obtaining a secure temporary link for reviewers of an article
related to unpublished data
• Data versioning
• Metrics (download, view, guest book)
10
And key user requirements
11. • Open source software, created in 2006 by Harvard university
• Set up a local Dataverse instance and participate in the Dataverse
network (CIRAD, INRA, Science Po...)
• Integrate an "ecosystem" of interoperable data repositories
11
Software for data repository
dataverse.org
https://dataverse.org/
14. Typical use of the data repository
14
• Creation of a repository folder and training of a referent person
• research units, projects can create a customizable repository folder "Dataverse", ...)
Description
and data
deposit by
researcher
Validation of
the deposit
by a general
administrator
Publication of
the dataset FAIR data
Researchers deposit, in accordance with
the data management plan, a data set in
their repository folder dedicated using
standardised formats and metadata to
describe their data
(open/ closed /embargo/
only metadata )