Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT

Research Data Management
- an introductory webinar
Tony Ross-Hellauer, OpenAIRE
Sarah Jones, EUDAT
This work is licensed under the Creative
Commons CC-BY 4.0 licence

Open Access Infrastructure
for Research in Europe
www.openaire.eu
Who we are
Research Data Services, Expertise &
Technology https://www.eudat.eu

• Why manage data?
• RDM in Horizon 2020 (+ recent changes)
• How to manage and share research data?
• EUDAT and OpenAIRE services
Overview

WHY MANAGE DATA?
Image CC-BY-NC-SA by Leo Reynolds www.flickr.com/photos/lwr/13442910354

Data explosion
• More and more data is
being created
• Issue is not creating data,
but being able to navigate
and use it
• Data management is
critical to make sure data
are well-organised,
understandable and
reusable

Digital data are fragile and susceptible to loss for a wide variety of reasons
• Natural disaster
• Facilities infrastructure failure
• Storage failure
• Server hardware/software failure
• Application software failure
• Format obsolescence
• Legal encumbrance
• Human error
• Malicious attack
• Loss of staffing competencies
• Loss of institutional commitment
• Loss of financial stability
• Changes in user expectations
Data loss
Image CC BY-NC-SA 2.0 by Dave Hill https://www.flickr.com/photos/dmh650/4031607067

Why manage data?
• Make your research easier
• Stop yourself drowning in irrelevant stuff
• Save data for later
• Avoid accusations of fraud or bad science
• Share your data for re-use
• Get credit for it
• Meet funder/institution requirements
Because well-managed data opens up
opportunities for re-use, sharing and
makes for better science!

RDM IN HORIZON 2020
Image “Open Data” CC BY 2.0 by http://www.descrier.co.uk

EC Open Research Data Pilot,
Jan 2015 -
• A limited, voluntary pilot (initially 8 programme areas) with opt-out and
safeguards
• Participating projects must:
• Keep a data management plan, to be updated at regular intervals
• Deposit in an open access repository:
1. the data, including associated metadata, needed to validate the
results presented in scientific publications as soon as possible;
2. other data, including associated metadata, as specified and within the
deadlines laid down in the data management plan

EC Open Research Data Pilot
Opt-out Reasons
https://open-data.europa.eu/data/dataset/open-research-data-the-uptake-of-
the-pilot-in-the-first-calls-of-horizon-2020

Just announced!
H2020 - Open Data by
Default from 2017

CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Research data lifecycle
CREATING DATA: designing research,
DMPs, planning consent, locate existing
data, data collection and management,
capturing and creating metadata
RE-USING DATA: follow-
up research, new
research, undertake
research reviews,
scrutinising findings,
teaching & learning
ACCESS TO DATA:
distributing data,
sharing data,
controlling access,
establishing copyright,
promoting data PRESERVING DATA: data storage, back-
up & archiving, migrating to best format
& medium, creating metadata and
documentation
ANALYSING DATA:
interpreting, & deriving
data, producing outputs,
authoring publications,
preparing for sharing
PROCESSING DATA:
entering, transcribing,
checking, validating and
cleaning data, anonymising
data, describing data,
manage and store data
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle

• Findable
– assign persistent IDs, provide rich metadata, register in a
searchable resource...
• Accessible
– Retrievable by their ID using a standard protocol, metadata remain
accessible even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use standard
vocabularies, qualified references...
• Reusable
– Rich, accurate metadata, clear licences, provenance, use of
community standards...
www.force11.org/group/fairgroup/fairprinciples
FAIR data

A DMP is a brief plan to define:
• how the data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications, but
are useful whenever researchers are creating data.
Data Management Plans

DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020
Guidance from EUDAT and OpenAIRE being added
https://dmponline.dcc.ac.uk

• Metadata and documentation is needed to locate and
understand research data
• Think about what others would need in order to find,
evaluate, understand, and reuse your data.
• Get others to check the metadata to improve quality
• Use standards to enable interoperability
Metadata & documentation

Metadata standards
Use relevant standards for interoperability
http://rd-alliance.github.io/metadata-directory

Where to store data?
• Your own drive (PC, server, flash drive, etc.)
– And if you lose it? Or it breaks?
• Somebody else’s drive / departmental drive
• “Cloud” drive
– Do they care as much about your data as you do?
• Large scale infrastructure services like EUDAT

How to backup?
• 3... 2... 1... backup!
– at least 3 copies of a file
– on at least 2 different media
– with at least 1 offsite
• Use managed services where possible e.g. University
filestores or infrastructure services like EUDAT rather
than local or external hard drives
• Ask IT teams for advice

Backup and preservation
– not the same thing!
• Backups
– Used to take periodic snapshots of data in case the current version
is destroyed or lost
– Backups are copies of files stored for short or near-long-term
– Often performed on a somewhat frequent schedule
• Archiving
– Used to preserve data for historical reference or potentially during
disasters
– Archives are usually the final version, stored for long-term, and
generally not copied over
– Often performed at the end of a project or during major milestones

Data repositories
http://databib.org
http://service.re3data.org/search
• Does your publisher or funder suggest a repository?
• Are there data centres or databases for your discipline?
• Does your university offer support for long-term preservation?

A mistake in a spreadsheet led
to dramatically different results
from those published.
These results were cited by
the International Monetary
Fund and the UK Treasury to
justify austerity programmes.
Had the data been shared, this
could have been picked up
earlier.
The importance of sharing data

Concerns about data sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
security and confidentiality of
sensitive data
lack of acknowledgement / credit
loss of advantage when competing
for research funding

Concern Solution
sensitive data
loss of advantage when competing
metadata
metadata
metadata
metadata

Concern Solution
provide rich Abstract, Purpose,
Constraints and Supplemental
Information where needed
sensitive data
• the metadata does NOT
contain the data
• Use Constraints specify who
may access the data and how
specify a required data citation
within the Use Constraints
loss of data insight and
competitive advantage when vying
create second, public version with
generalised Data Processing
Description

Make data shareable
• Create robust metadata that has been checked
• Include reference information in metadata e.g. unique
IDs & properly formatted data citations
• Publish your metadata so it’s discoverable. Use portals,
clearing houses, online resources…
• Package up the data and associated metadata to deposit
in repositories
• License the data clearly

www.dcc.ac.uk/resources/how-guides/license-research-data
Licensing research data
This DCC guide outlines the pros and
cons of each approach and gives
practical advice on how to implement
your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
ND No Derivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access
guidelines point to:
or

EUDAT licensing tool
Answer questions to determine which licence(s) are
appropriate to use
http://ufal.github.io/public-license-selector

What to preserve & share
It’s not possible to keep everything. Select based on:
– What has to be kept e.g. data underlying publications
– What can’t be recreated e.g. environmental recordings
– What is potentially useful to others
– What has scientific, cultural or historical value
– What legally must be destroyed
How to select and appraise research data:
www.dcc.ac.uk/resources/how-guides/appraise-select-research-data

EUDAT & OPENAIRE SERVICES
Image CC-BY-NC ‘Data centre’ by Bob Mical www.flickr.com/photos/small_realm/15995555571

EUDAT services
EUDAT offers a pan-European solution, providing a
generic set of services to ensure minimum level of
interoperability
Building common
data services in
close collaboration
with 25+
communities

EUDAT B2 service suite
Covering both access and
deposit, from informal data
sharing to long-term
archiving, and addressing
identification,
discoverability and
computability of both long-
tail and big data, EUDAT’s
services will address the
full lifecycle of research
data

CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
PIDs  Referencing data:
Finding data and making
data findable
Data Transfer from
public data servers
Store mutable data
Accessing services
Move data to HPC

OpenAIRE services:
zenodo.org
For all content types!
With GitHub integration!
Upload Describe Publish
Create communities!

https://www.openaire.eu/search
Link data to publications

OpenAIRE training and
support materials
• Briefing papers, factsheets,
Webinars, workshops,
FAQs
• Information on:
• Open Research Data Pilot
• Creating a data management
plan
• Selecting a data repository
https://www.openaire.eu/opendatapilot
https://www.openaire.eu/support

www.eudat.eu www.openaire.eu
Thanks – any questions?
Contact us:
Tony Ross-Hellauer, OpenAIRE: ross-hellauer@sub.uni-goettingen.de
Sarah Jones, EUDAT: Sarah.Jones@glasgow.ac.uk
Acknowledgements:
Thanks to EUDAT colleagues Mark van de Sanden and Christine Staiger
for slides.
Content has also been repurposed from the DataONE Educational
modules, ‘Data Management’ and ‘Data Sharing’ Retrieved from
https://www.dataone.org/education-modules

Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT

Similar to Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT (20)

More from OpenAIRE

More from OpenAIRE (20)

Recently uploaded

Recently uploaded (20)

Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT

Editor's Notes