Data is increasing in significance. It will unquestionably matter to your research careers, more than it does to your supervisors’ generation. Learn good data habits now! You’ll need them later.
Some formats are better for data sharing and long-term preservation than others. It’s preferable to use formats that are uncompressed (e.g. large, high-quality files like .wav), non-proprietary (i.e. open) standards that are documented and well-understood. This aids preservation and interoperability. Some data centres have preferred formats for deposit so it’s worthwhile encouraging researchers to consult these to check.
To make sure their data can be understood by themselves, their community and others, researchers should create metadata and documentation. Metadata is basic descriptive information to help identify and understand the structure of the data e.g. title, author... Documentation provides the wider context. It’s useful to share the methodology / workflow, software and any information needed to understand the data e.g. explanation of abbreviations or acronyms There are lots of standards that can be used. The DCC started a catalogue of disciplinary metadata standards which is now being taken forward as an international initiative via an RDA working group
The EC guidelines suggest selecting a suitable repository. The Databib and Re3data lists can be useful for this. They allow you to search and browse by subject. Re3data also allows you to restrict the search by certificates, open access repositories and persistent identifiers.
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options Under Horizon 2020 it’s recommended that researchers use CC-0 or CC-BY to make data as open as possible.
I recommend this ICPSR resource It explains the importance of different questions as a pointer to how to answer Examples are given. This is the most frequent request we get at DCC - examples help researchers think of what to write for their context
The DCC has produced a How to guide on writing DMPs and developed a tool to help
Research Data Management
DCC, University of Glasgow
•University of the West of England, 9th
• Quiz of funders’ requirements
• Introduction to RDM
• Data management planning
• Demo of DMPonline
“the active management and
appraisal of data over the
lifecycle of scholarly and
Data management is part of
good research practice
What is research data management?
Why manage your research data?
• To make your research easier!
• To stop yourself drowning in irrelevant stuff
• In case you need the data later
• To avoid accusations of fraud or bad science
• To share your data for others to use and learn from
• To get credit for producing it
• Because somebody else said to do so
RCUK Common Principles on Data Policy
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
Benefits of data sharing data (1)
“It was unbelievable. Its not science
the way most of us have practiced
in our careers. But we all realised
that we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
•... scientific breakthroughs
Benefits of data sharing (2)
“There is evidence that studies that make their
data available do indeed receive more citations
than similar studies that do not.”
Piwowar H. and Vision T.J 2013 "Data reuse and the open data
citation advantage“ https://peerj.com/preprints/1.pdf
9% - 30% increase
•... more citations
If you plan to share your data....
• Have you got consent for sharing?
• Do any licences you’ve signed permit sharing?
• Is your data in suitable formats?
Decisions made early on affect what you can do later
Some formats are better for long-term
It’s preferable to opt for formats that are:
• Open, documented
• Standard representation (ASCII, Unicode)
Data centres may have preferred formats for deposit e.g.
Type Recommended Non-preferred
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
What would someone unfamiliar with your
data need in order to find, evaluate,
understand, and reuse them?
Consider the differences between someone inside
your research group, someone outside your
group but in your field, and someone outside
Documentation and standards
Metadata: basic info e.g. title, author, dates, access rights...
Documentation: context, workflows, methods, code, data dictionary...
Use standards wherever possible for interoperability
Tools for managing data
Where to store your data?
• Your own drive (PC, server, flash drive, etc.)
– And if you lose it? Or it breaks?
• Somebody else’s drive
• Departmental drive
• “Cloud” drive
– Do they care as much about your data as you do?
How to backup?
• 3… 2… 1… backup!
– at least 3 copies of a file
– on at least 2 different media
– with at least 1 offsite
• Use managed services where possible e.g. University
filestores rather than local or external hard drives
• Ask central or local IT team for advice
Archiving: data repositories
•OpenAIRE-CERN joint effort
•Multiple data types
– Long tail of research data
•Citable data (DOI)
•Links to funding, pubs, data & software
•CREATIVE COMMONS LIMITATIONS
• NC Non-Commercial
• What counts as commercial?
• SA Share Alike
• Reduces interoperability
• ND No Derivatives
• Severely restricts use
License your data for reuse
Outlines pros and cons of each
approach and gives practical advice on
how to implement your licence
• Makes it easier for readers to locate
the data and validate findings
• Data citations ensure that data
contributors receive proper credit
• Can link to reuse to show impact
• Less danger of rival researchers
‘stealing’ results from those who
publish their data openly
Managing and sharing data:
a best practice guide
• How to write a DMP
• Formatting your data
• Data sharing
• Ethics and consent
Putting the pieces together...
Photo by Dread Pirate Jeff
What is a data management plan?
A brief plan written at the start of your project to define:
• how your data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications,
but are useful whenever you’re creating data.
Why YOU need a Data
What if this was your laptop?
Which UK funders require a DMP?
DCC Checklist for a DMP
• 13 questions on what’s asked across the board
• Prompts / pointers to help researchers get started
• Guidance on how to answer
Common themes in DMPs
1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
A useful framework to get you started
Think about why the
questions are being
asked – why is it
useful to consider
Look at examples to
help you understand
what to write
Tips for writing DMPs
• Seek advice - consult and collaborate
• Consider good practice for your field
• Base plans on available skills & support
• Make sure implementation is feasible
• Technical plan submitted to AHRC by Bristol Uni
• Rural Economy & Land Use (RELU) programme examples
• UCSD example DMPs (20+ scientific plans for NSF)
• My DMP – a satire (what not to write!)
More at: https://dmponline.dcc.ac.uk/help#DMPhelp
Help from the DCC
A web-based tool to help researchers
write data management plans
Thanks – any questions?
DCC guidance, tools and case studies:
Follow us on twitter:
@digitalcuration and #ukdcc
Credit to Dorothea Salo, Ryan Schryver and colleagues for content from the “Escaping Datageddon”
presentation for slides 4, 11 & 14, available at: http://www.slideshare.net/cavlec/escaping-datageddon
And to the Research360 project at the University of Bath for content from the “Managing your research
data” presentation for slide 10, available at: http://opus.bath.ac.uk/32296