Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organisations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)? What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable? Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability? In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?
How will the data be licensed to permit the widest re-use possible? When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible. Are the data produced and/or used in the project useable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why. How long is it intended that the data remains re-usable? Are data quality assurance processes described?
Remember to give also your open data and software a proper licence.
The OA guidelines under Horizon 2020 point to CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
Ethical and legal issues can also be discussed in the context of the ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action.
Let’s move on to the considerations to make when managing and sharing data
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
DANS is een instituut van KNAW en NWO
Open Research Data in H2020
OpenAIRE webinar, 26 October 2016
Who we are
Open Access Infrastructure for Research in Europe
DANS: Data Archiving and Networked
Institute of Dutch
(KNAW & NWO)
dates back to
to digital research
short- and mid-
EASY: certified long-term
System for self-deposit
information in the
Research data in context
• Brief recap from recent OpenAIRE-EUDAT webinars
• The updated Guidelines for FAIR Data Management:
• F, A, I, R
• Costs, data security, ethical aspects, other RDM procedures
• Links to EC and OpenAIRE information
Introductory RDM webinar, Tony Ross-Hellauer & Sarah Jones, 26 May:
• Reasons to manage data
• How to manage and share data (+ how to respond to concerns about
• EUDAT & OpenAIRE services
Q&A document: https://b2drop.eudat.eu/s/0H6qRgwdwkAVFvD#pdfviewer
“How to write a DMP”, Sarah Jones & Marjan Grootveld, 7/14 July:
• What is a Data Management Plan and why to write it?
• Example DMPs in different domains, with lots of links!
• Lessons and guidance (e.g. storing =/= archiving; how to find a
repository; file-naming conventions)
All recordings and slides are on https://eudat.eu/events/webinars
https://www.eudat.eu Research Data Services, Expertise & Technology
Recap: why manage data?
(Not for the research funder, but for life we make data management plans)
Make your research easier
Stop yourself drowning in irrelevant stuff
Save data for later
Avoid accusations of fraud or bad science
Write a data paper, connect your nano publications
Share your data for re-use & get them validated in real life
Get credit for it
NON PECUNIAE INVESTIGATIONIS CURATORE
SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS
Horizon 2020: Open Research Data Pilot
The use of a Data Management Plan (DMP) is
required for projects participating in the Open
Research Data Pilot, detailing what data the
project will generate, whether and how they will
be exploited or made accessible for verification
and re-use, and how they will be curated and
Guidelines on FAIR DM v.3
Structure of the Guidelines:
1.Background: extension of the pilot
2.DMP general definition
3.Proposal, submission and evaluation
4.RDM plans during the project life cycle
6.Annex 1: the DMP template
1. Data summary
2. FAIR data
3. Allocation of resources
4. Data security
5. Ethical aspects
6. Other issues
7. Summary table “Fair DM at a glance”
• You should develop a DMP for your project.
• There is a single DMP template from start to finish.
• The DMP template is inspired by the FAIR principles:
research data should be findable, accessible, interoperable
and re-usable (without suggesting any specific technology,
standard, or implementation solution).
Also explicit in the new guidelines:
• From 1-1-2017 the pilot will cover all thematic areas of
• Costs related to open access to research data are eligible
for reimbursement during the duration of the project under
the conditions defined in the Grant Agreement.
Good things that remain
Whether a (proposed) project participates in the
ORD pilot or chooses to opt out does not affect
the evaluation of that project: proposals will not
be penalised for opting out.
Participating in the ORD pilot does
not necessarily mean opening up all
your research data: as open as
possible, as closed as necessary.
The DMP is a living document.
You are not required to
provide detailed answers to all
the questions in the first
version of the DMP (due M6).
Deposit in a research data repository:
a. the data needed to validate the results presented
in scientific publications, including the metadata;
b. any other data, including the metadata, as
specified in the DMP;
c. plus for a-b the documentation and the tools
that are needed to validate the results, e.g.
specialised software or software code, algorithms
and analysis protocols (when possible, these
A web-based tool to help researchers write DMPs
Guidance from EUDAT and OpenAIRE being added
funder to get
§2 Making data FAIR
– Assign persistent IDs, provide rich metadata, register in a
searchable resource, ...
– Retrievable by their ID using a standard protocol, metadata remain
accessible even if data aren’t...
– Use formal, broadly applicable languages, use standard
vocabularies, qualified references...
– Rich, accurate metadata, clear licences, provenance, use of
www.force11.org/group/fairgroup/fairprinciples and http://www.nature.com/articles/sdata201618
EC in the Guidelines: “This template is not intended as a strict
technical implementation of the FAIR principles, it is rather inspired
by FAIR as a general concept.”
Some F questions
2.1 Making data findable, including provisions for metadata
• Use metadata and specify standards for metadata creation
(if any). If there are no standards in your discipline
describe what type of metadata will be created and how.
• Search keywords
• Persistent and unique identifiers such as DOI
• File and folder naming conventions: see OpenAIRE-EUDAT
• Versioning of the datasets and clear version numbers
Metadata and documentation
• Metadata and documentation is needed to find and
understand research data.
• Think about what others would need in order to find,
evaluate, understand, and reuse your data.
• Get others to check the metadata to improve quality.
• Use standards to enable interoperability.
Some A questions
2.2 Making data openly accessible:
• Explain which data can’t be shared openly, if any
• Specify how access will be provided in case of restrictions,
e.g. through a data committee, a license, or arranged with
• Will methods or software tools needed to access the data
(if any) be included or documented?
• Deposit the data and associated metadata, documentation
and code preferably in certified repositories which support
Data Seal of Approval
ICSU World Data System
Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
Zenodo: http://www.zenodo.org Re3data.org: http://www.re3data.org
File format considerations
No clearcut definitions of “sustainable file format”.
Each archives has its own expertise, related to its designated
Level 1 Level 2 or 3 Preferred Accepted
audio .wav .ra, .mp3, .wma .wav, .flac .aiff, .mp3, .aac
chemistry NMR, ChemDoodle, ….pdb, .xyz
delimited flat file
w/DDL .mdb, .dbf, .acdb .sql, .siard, .csv .mdb, .dbf, .hdf5 …
.mp1, .mp2, .mp4,
.mpg2, .mpg4, .avi,
Before clocks were invented, people
kept time using different instruments to
observe the Sun’s zenith at noon.
Towns and cities set clocks based on
sunsets and sunrises. Time calculation
became a serious problem for people
travelling by train, sometimes hundreds
of miles in a day. UTC is the World's
Some I questions
2.3 Making data interoperable
• Specify what data and metadata vocabularies, standards or
methodologies you will follow to facilitate interoperability.
• Standard vocabulary to allow inter-disciplinary
interoperability or a mapping from your vocabulary to more
commonly used ontologies?
Some R questions
2.4 Increase data re-use (through clarifying licences)
• License the data to permit the widest reuse possible
• Specify a data embargo, if this is needed
• How long will the data remain reusable?
• Describe data quality assurance processes
Re-use over time
Licensing research data and software
EUDAT licensing wizard help you pick licence for data & software
You should also license Open Access data, or waive rights.
Horizon 2020 Open Access
guidelines point to:
Keep everything? For always?
When regenerating data is cheaper than archiving, don’t archive.
Select what data you’ll need and want to retain.
10 years is often stated in data policies and academic codes, but
data can be valuable for ages, in climatology, sociology, health
sciences, astronomy, linguistics, … Look beyond minimal retention
periods where relevant.
“The lifetime of software is generally not as long as that of data”
(Daniel Katz e.a. http://bit.ly/2eScCKp)
RDNL Selection criteria: http://www.researchdata.nl/en/services/data-
DCC How-to guide: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
§3 Allocation of resources
• What are the costs for making data FAIR in your project?
• Resources for long term preservation
Check the UK Data Service Costing model.
Rule of thumb: 5% of the project budget is spent on RDM.
The High Level Expert Group on the European Open Science Cloud
recommends that “well budgeted data stewardship plans should be
made mandatory and we expect that on average about 5% of
research expenditure should be spent on properly managing and
UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costing
df#view=fit&pagemode=none p. 19
• Provisions for data recovery, secure storage, transfer of
• Safely stored in certified repositories for long term
preservation and curation?
• Any ethical or legal issues that can impact data sharing?
• Informed consent for data sharing and long term
preservation included in questionnaires dealing with
Which other national/funder/sectorial/departmental
procedures for data management do you use (if any)?
Image “Fishbone” CC BY-NC-ND 2.0 by ttps://www.flickr.com/photos/mrjnl/
• Think about the desired end result and plan for this.
• Involve all work packages and partners to get a coherent
• “Sharing” means “outside the consortium”.
• Approach the DMP in whatever way best fits your project:
• EC template is intended as a service, not an obligation. Read the
background information and the guidance, and use it as a checklist.
• More than one dataset? Describe generically what is
possible and dataset-specific what is necessary.
• Focus effort on datasets you’ll create rather than reuse.
The EC Open Research Data pilot
Key sources of information
• Guidelines on Open Access to Scientific Publications and Research Data in Horizon
• Guidelines on Data Management in Horizon 2020
• Annotated model grant agreement, clause 29.3
• New infographic summarising key policy points
• Open Access and Data Management
OpenAIRE support materials
• Briefing papers,
• Information on:
• Open Research Data Pilot
• Creating a data
• Selecting a data repository
• Personal data
DANS is een instituut van KNAW en NWO
Thanks to Sarah Jones (DCC), OpenAIRE and EUDAT for slides.