OpenAIRE and Eudat services and tools to support FAIR DMP implementation

Credits: OpenAIRE team
Sarah Jones, Data Curation Centre, DCC (UK)
Marjon Grootweld, Dans (NL)
Natalia Manola, ATR (GR)
FAIR Data Management: best practices and open
issues
Paola Gargiulo, OpenAIRE NOAD/Cineca
OpenAIRE and Eudat services and tools to
support FAIR DMP implementation

Agenda
• The Open Research Data Pilot
• The data management plan
• OPENAIRE tools and services for the Data Pilot
• EUDAT data services
2

Open Research Data Pilot (2015-
2016): aims
To make the research data
generated by selected
Horizon 2020 projects
accessible with as few
restrictions as possible,
while at the same time
protecting sensitive data
from inappropriate
EC:
information already paid for by the public
should not be paid for again.
Open data is data that is free to access and
reuse

To whom does the Data Pilot
concern?
Current situation 2015-2016:
• Researchers funded by Horizon 2020 within 9 specified call areas.
• Opt out and opt in are possible and are being used
• Call areas: https://www.openaire.eu/opendatapilot
As of 2017:
• European Cloud Initiative to give Europe a global lead in the data-
driven economy.
• Open data will become the default option. The pilot will be extended
to cover all call areas. Opting out remains possible.
• Press release: http://europa.eu/rapid/press-release_IP-16-
1408_en.htm
Daniel Spichtinger (EC) at OpenCon 14-11-15: 3,699 Horizon 2020 signed grant agreements – 149/431 projects in core areas opted out - 409/3268 projects in
other areas opted in 4

Which research has to partipate in the
pilot? (2015- 2016)
• Future and Emerging Technologies
• Research infrastructures – (new: coverage of the whole area)
• Leadership in enabling and industrial technologies – Information and
Communication Technologies
• Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing,
and Biotechnology: ‘nanosafety’ and ‘modelling’ topics (new)
• Societal Challenge: Food security, sustainable agriculture and forestry, marine and
maritime and inland water research and the bioeconomy - selected topics as
specified in the work programme (new)
• Societal Challenge: Climate Action, Environment, Resource Efficiency and Raw
materials – except raw materials
• Societal Challenge: Europe in a changing world – inclusive, innovative and
reflective Societies
• Science with and for Society 5

Horizon 2020
Open Data by Default from 2017
Just announced!

Two types of data:
Data, including metadata, needed to validate
the results in scientific publications
Other data, including metadata, as specified
in the Data Management Plan, like raw data

The following slides come from the EC’s open access team
and provide an overview to the key points. Content from
Jean-Francois Dechamp and colleagues.
Mail: RTD-open-access@ec.europa.eu
Web: http://ec.europa.eu/research/openscience/index.cfm
Twitter: @OpenAccessEC
RDA National Event in Italy, 14-15 November 2016 8

Publications
Openly accessible and minable.
Eligible costs for APCs.
Research data
Openly accessible research data can
typically be accessed, mined,
exploited, reproduced and
disseminated free of charge for the
user.

Three top reasons to opt out
Whether a (proposed) project participates
in the ORD or chooses to opt out does
not affect the evaluation of that project.
Proposals will not be penalised for opting
out

Reasons for opting out:
14
• participation is incompatible with the Horizon 2020 obligation to
protect results that can reasonably be expected to be commercially or
industrially exploited;
• participation is incompatible with the need for confidentiality in
connection with security issues;
• participation is incompatible with rules on protecting personal data;
• the project will not generate / collect any research data; or
• there are other legitimate reasons not to take part in the Pilot.
• Note that partial opt out is possible – and preferable to full opt out!

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

FAIR data
• Findable
– assign persistent IDs, provide rich metadata, register in a searchable
resource...
• Accessible
– Retrievable by their ID using a standard protocol, metadata remain accessible
even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use standard vocabularies,
qualified references...
• Reusable
– Rich, accurate metadata, clear licences, provenance, use of community
standards...

Findable
• Use metadata and specify standards for metadata creation (if
any). If there are no standards in your discipline describe what
type of metadata will be created and how
• Search keywords
• Persistent and unique identifiers such as DOIs or other
handles
• File and folder naming conventions
• Versioning of the datasets and clear version numbers
18

Metadata and documentation
• Metadata and documentation is needed to find and
understand research data
• Think about what others would need in order to find,
evaluate, understand, and reuse your data
• Get others to check the metadata to improve quality
• Use standards to enable interoperability
19

Where to find metadata standards
Metadata Standards
Directory
Broad, disciplinary listing of
standards and tools
Maintained by RDA group
http://rd-alliance.github.io/metadata-
directory
Biosharing
A portal of data standards,
databases, and policies for life,
environmental and biomedical
sciences
https://biosharing.org
20

Accessible
• Explain which data can’t be shared openly, if any
• Specify how access will be provided in case of restrictions,
e.g., through a data committee, a license, or arranged with the
repository
• Will methods or software tools needed to access the data (if
any) be included or documented?
• Deposit the data and associated metadata, documentation and
code preferably in certified repositories which support Open
Access
21

Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
What to deposit?
a. the data needed to validate the results
presented in scientific publications, including the
metadata;
b. any other data, including the metadata, as
specified in the DMP;
c. plus for a-b the documentation and the tools
that are needed to validate the results, e.g.
specialised software or software code,
algorithms and analysis protocols (when
possible, these instruments themselves).
22

Shop around
http://databib.org
http://service.re3data.org
Re3data is a registry of data repositories
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Searching with Re3data.org
www.fosteropenscience.eu/content/re3data-demo
23

Interoperable
• Interoperability on data and metadata, on data exchange
formats and protocols
• Specify what data and metadata vocabularies, standards or
methodologies you will follow to facilitate interoperability
• Standard vocabulary to allow inter-disciplinary interoperability
or a mapping from your vocabulary to more commonly used
ontologies?
Aim for compliance to globally accepted practices
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 24

• Clarify licences early on
• License the data to permit the widest reuse possible
• Specify a data embargo, if needed
• If data re-use by third parties is restricted, explain why
• How long will the data remain reusable?
• Describe data quality assurance processes
Reusable

www.dcc.ac.uk/resources/how-guides/license-research-data
License research data openly
DCC guide outlines the pros and cons of
each approach and gives practical advice
on how to implement your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
ND No Derivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access
guidelines point to:
or

What is a data management plan?
A plan written at the start of a project to define:
• how the data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications, but are useful
whenever researchers are creating data
The DMP is a living document.
You are not required to provide
detailed answers to all the
questions in the first version of
the DMP (due M6)
28
Explain any selection criteria in the DMP

When to submit the DMP
• Note that the Commission does NOT require applicants to submit a
DMP at the proposal stage.
• A DMP is therefore NOT part of the evaluation.
• DMPs are a deliverable
• Note that the Commission requires updates. A DMP is a living or
“active” document.

What aspects of RDM should be in a DMP?
 What data will be created (format, types, volume...)
 Standards and methodologies to be used (incl. metadata)
 How ethics and Intellectual Property will be addressed
 Plans for data sharing and access
 Strategy for long-term preservation Create
Document
Use
Store
Share
Preserve
A DMP is a plan to share!

What should be deposited?
• The data needed to validate results in scientific publications (minimally!).
• The associated metadata: the dataset’s creator, title, year of publication, repository,
identifier etc.
• Follow a metadata standard in your line of work, or a generic standard, e.g. Dublin Core or
DataCite., and be FAIR.
• The repository will assign a persistent ID to the dataset: important for discovering and citing the
data.
• Documentation like code books, lab journals, informed consent forms – domain-
dependent, and important for understanding the data and combining them with other
data sources.
• Software, hardware, tools, syntax queries, machine configurations – domain-
dependent, and important for using the data. (Alternative: information about the
software etc.)
Basically, everything that is needed to replicate a study should be available for others.
Research Data Alliance (RDA) http://rd-alliance.github.io/metadata-directory/standards/
FAIR Guiding Principles for scientific data management http://www.nature.com/articles/sdata201618 31

Archive the data openly,
unless…
• Confidentiality and security issues can be good reasons not to
publish or share – all – data. Note in the DMP the reasons for
not giving access, and deposit that part of the data under a
Restricted Access regime.
• When regenerating data would be cheaper than archiving, don’t
archive. Spend time on selecting what data you’ll need and
want to retain. Motivate your criteria in the DMP.
See http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
For selection criteria see https://www.openaire.eu/opendatapilot
32
Grant Agreement, Art. 29.3, Open Access to research data:

A DMP is about ‘keeping’ data
• Storing data < > archiving data
• Archived data < > findable data
• Findable < > accessible
• Accessible < > understandable
• Understandable < > usable
• a USB stick is not safe
• Figshare is not a Trustworthy Digital Repository
• a persistent identifier is essential but no guarantee for usability
• Data in a proprietary format are not sustainable

How much does it cost? Who pays?
• What are the costs for making data FAIR in your project?
• Resources needed for long term preservation
• Check the UK Data Service Costing model
• The High Level Expert Group on the European Open Science Cloud
recommends that “well budgeted data stewardship plans should be
made mandatory and we expect that on average about 5% of
research expenditure should be spent on properly managing and
stewarding data”
• Who pays? How?UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costing
HLEG report
http://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf#view=fit&pagemode=none
34

http://www.curationexchange.org/
Costs?

OPENAIRE SUPPORT KIT
FOR THE DATA PILOT
36

OPEN ACCESS
OpenAIRE implements
the
EC requirements
& SUPPORTS THE OPEN DATA
PILOT

Human Network e-infrastructure
 NOADS: National Open Access Desks
 Monitor and foster the adoption of Open
Access policies at the local level
 Support researchers at the implementation of
the Open Data Pilot
 FP7 post grant APCs Pilot
 e-infrastructure for monitoring impact of OA
mandates and research projects
 OpenAIRE guidelines for metadata exchange
 Zenodo Repository for the deposition of research
products
THE POINT OF REFERENCE FOR OPEN ACCESS IN EUROPE
50 Partners: EU countries, data centers, universities, libraries, repositories
Open Access infrastructure
for research in Europe

Integrated Scientific Information System
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
17.3 mi unique publications
760+ validated data providers
370Κ publications linked to
projects from 6 funders
28 K datasets linked to
publications
3.5K links to software
repositories
33K organizations
Organization
s
Projects
AuthorsDatasets
Publications
Data
Providers
Software Facilities Methods
Research
Communities
OpenAIRE-Connect
From January 2017
39

OpenAIRE support
materials
Briefing papers, factsheets,
webinars, workshops, FAQs
Information on
• Open Research Data Pilot
• Creating a data
management plan
• Selecting a data repository
• Personal data
Developing guidance to add
to DMPonline
https://www.openaire.eu/opendatapilot
https://www.openaire.eu/support

Information at the OpenAIRE website
• Open Research Data Pilot
https://www.openaire.eu/opendatapilot
• What is the pilot? Which H2020 strands are required to participate? What
practical steps should the researcher take?
• Create a Data Management Plan
https://www.openaire.eu/opendatapilot-dmp
• Information about how to create a Data Management Plan. First steps; When to
write and revise your Data Management Plan
• Select a Data Repository
https://www.openaire.eu/opendatapilot-repository
• Information about how to select a repository
• Frequently Asked Questions about the Open Research Data
Pilot
https://www.openaire.eu/support/faq
41

Webinars
Webinar page
https://www.openaire.eu/webinars/
• E.g. about Pilot, Zenodo and data
management, but also about OA
publications and how to make your
repository OpenAIRE compatible.
• Various presenters; slides included.
42

Briefing PaperRDM
OpenAIRE Research Data Management Briefing
Paper
• https://www.openaire.eu/briefpaper-rdm-infonoads
• This extensive briefing paper gives an overview of
Research Data Management with practical sections
about data management planning, and archiving the
research data for reuse.
44

OpenAIRE services
• Researchers
• Zenodo for all types of publications, data and software
• Claiming – linking research results
• Amnesia, an anonymization tool for all
• Data providers – Interoperability Guidelines, validation,…
• Project coordinators – reporting
• Funders and institutions – monitoring
• Research communities – gathering, monitoring all research
45
DASHBOARDS

Zenodo
Multi-disciplinary repository used for the long-tail of research
data
• An OpenAIRE-CERN joint effort
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software – link to Github
• Assigns a Digital Object Identifier (DOI), up t 50GB per
dataset
• Links funding, publications, data & software
www.zenodo.org
46

What is DMPonline?
• A web-based tool to help researchers write Data
Management and Sharing Plans
• Includes requirements and guidance from funders,
universities and other groups
• Developed by the Digital Curation Centre

How to write a DMP
• Template available from https://dmponline.dcc.ac.uk/
•
• And from a few national DMPonline sites, e.g. in Spain and Belgium
See https://www.openaire.eu/opendatapilot-dmp - Spain: http://pgd.consorciomadrono.es/ - Belgium: pilot and therefore limited to authorised persons 48
1

https://dmponline.dcc.ac.uk
DMPonline (free tool)

DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020
Guidance from EUDAT and OpenAIRE being added
https://dmponline.dcc.ac.uk

Deliver the DMP
EC: “Since DMPs are expected to mature during the project, more
developed versions of the plan can be included as additional
deliverables at later stages. (…) New versions of the DMP should be
created whenever important changes to the project occur due to
inclusion of new data sets, changes in consortium policies or external
factors.”
52

Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
Zenodo: http://www.zenodo.org/ 54

How to select a repository?
1/2
Main criteria for choosing a data repository:
• Certification as a ‘Trustworthy Digital Repository’, with an explicit
ambition to keep the data available in the long term.
• Network of trustworthy digital repositories for long-term preservation of the data
after the research is finished.
• Three common certification standards for TDRs:
Data Seal of Approval: http://datasealofapproval.org/en/
nestor seal for DIN 31644: http://www.langzeitarchivierung.de/Subsites/nestor/EN/nestor-Siegel/siegel_node.html
ISO 16363: http://www.iso16363.org/
55

Main criteria for choosing a data repository:
• Certification as a ‘Trustworthy Digital Repository’, with an explicit ambition
to keep the data available in the long term.
• Matches your particular data needs and is FAIR compliant: e.g. certain file
formats; mixture of Open and Restricted Access. So contact the repository
of your choice when writing the first version of your DMP, or earlier.
• Provides guidance on metadata and on how to cite the data that has been
deposited.
• Gives your submitted dataset a persistent and globally unique identifier:
for sustainable citations – both for data and publications – and to link back
to particular researchers and grants.
How to select a repository?
2/2
https://www.openaire.eu/opendatapilot-repository 56

EUDAT project
https://eudat.eu/ 58
EUDAT offers common data services
to both research communities and
individuals through a network of 35
European organisations.

EUDAT offers data
services
EUDAT services are designed, built and implemented based on user
community requirements.
59
PHYSICAL SCIENCES
& ENGINEERING
MATERIALS &
ANALYTICAL
FACILITIES
MAPPER
BIOMEDICAL &
MEDICAL SCIENCES

EUDAT services
60
B2 SERVICE SUITE
B2ACCE
SS
B2HANDLE
https://eudat.eu/services

• Store and exchange data with
colleagues and team members,
including research data not finalized
for publishing
• share data with fine-grained access
controls
• synchronize multiple versions of data
across different devices
e.g. B2DROP – a solution for
researchers and scientists to:
Features:
20GB storage per user
Living objects, so no PIDs
Versioning and offline use
Desktop synchronisation
B2DROP is hosted at the Jülich Supercomputing Centre
Daily backups of all files in B2DROP are taken and kept on tape
b2drop.eudat.eu

• move large amounts of data between
data stores and high-performance
compute resources
• re-ingest computational results back
into EUDAT
• deposit large data sets into EUDAT
resources for long-term preservation
Features:
high-speed transfer
reliable and light-weight
manages permanent PIDs
62
e.g. B2STAGE - Facilitating communities to:
eudat.eu/b2stage

EUDAT training
63
https://eudat.eu/training
https://eudat.eu/events/webina

www.openaire.eu
@openaire_eu
facebook.com/groups/openaire
linkedin.com/groups/OpenAIRE-3893548
Openaire-it@cineca.it
64
Questions?

OpenAIRE and Eudat services and tools to support FAIR DMP implementation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to OpenAIRE and Eudat services and tools to support FAIR DMP implementation

Similar to OpenAIRE and Eudat services and tools to support FAIR DMP implementation (20)

More from Research Data Alliance

More from Research Data Alliance (20)

Recently uploaded

Recently uploaded (20)

OpenAIRE and Eudat services and tools to support FAIR DMP implementation

Editor's Notes