This document summarizes a presentation on the benefits of research data management. It discusses how data management can benefit researchers through increased citations and compliance with funder requirements. It also benefits society by enabling data sharing, reuse and discovery. However, many researchers do not practice good data management due to a lack of skills, resources or incentives. The presentation provides information on data management best practices and their importance for research excellence.
ipres2008: the Digital Preservation Training Programme
University of Northumbria Research
1. Research data management:
Benefits for the researcher,
Benefits for Society
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY The DCC is supported by Jisc
2. A summary
• Some benefits:
– Citation & impact
– Compliance with funders & regulation
– Improving your research
• What stops us ?
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 2
3. An alternative summary
Being Selfish
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 3
What’s
possible now
… and still
benefiting others
Being Just Good
Enough
Thanks to:
Neil Chue Hong (@npch), Software Sustainability
Institute
ORCID: 0000-0002-8876-7606
David Flanders (@dfflanders), Dr Steven Manos
(DrStevenManos)
University of Melbourne.
All my colleagues at the DCC
Cameron Neylon (@CameronNeylon)
4. “the active management and appraisal of
data over the lifecycle of scholarly
and scientific interest”
Data management is part of
good research practice
What is Research Data Management?
Plan
Create
Document
Use
Publish
Share
Slide by Sarah Jones, DCC
5. Should all data be open?
• NO
• Many reasons – most to do with human
subjects
• But data existence should always be open
• Allows discovery & negotiation on use
• Avoids pointless replication
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 5
7. But if you could publish just data…
• You could gain benefit even from the
experiments that fail – as long you got good
data
• ‘Data papers’ are one way to achieve this
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 7
8. 2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 8
Findable, citable data has value
• Important to link publications to data (and vice versa)
• Increases citations – of data & publication
• Increases reuse (hence value)
• But effects exist even without publication, if data is:
– Archived
– Citable
– Discoverable
• All benefit – researcher; institution; publisher
9. Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 9
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
10. Funders are making demands
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 10
12. Regulatory requirements
• Data protection, freedom of information,
research ethics – all apply to data
• If your data is badly managed:
– Compliance is hard
• Know what you deleted (and why) as well as
what you have
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 12
13. Because it’s good practice
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 13
“Data management is
essential to
excellence in
research”
Professor Charlotte Clarke, Associate
Dean for Research, School of Health,
Community and Education Studies
Apart from the benefits for
research, good data
management is vital for
many reasons:
accountability, security,
appropriate data-sharing,
re-use protocols and
preservation for example
Prof Julie McLeod, School
of Computing, Engineering
& Information Sciences
www.northumbria.ac.uk/browse/ne/uninews/datamanagement?view=Standard&news=archive
14. Finally…
• Well-managed data makes your research
easier, now and in future
• Well-managed data is easier to share, more
likely to be re-used
• ISharing data is good for you
• It’s good for all of us
• It isn’t as hard as you think – we’re here to
show you how!
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 14
18. Acquire research data skills
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 18
19. Data reuse - messages
2015-06-22 Kevin Ashley – Warsaw data workshop - CC-BY 19
Often your data tells
stories that your
publications do not
Not all data comes from
other researchers
One person’s noise is
another person’s signal
Discipline-bounded data
discovery doesn’t give us
all we need or want
Editor's Notes
The DCC defines data management as the active management and appraisal of data over the lifecycle of scholarly interest.
RDM is ‘active’ – digital materials can’t just be left and opened again in 10-20 years time. Lots of things change (hardware, software, operating systems…) so you need to proactively manage data.
You also need to select what can and should be kept. Not everything can be retained legally and only some data are valuable to share.
Research data management is about all activities in the lifecycle from initially planning (writing DMPs), through creating data and documentation, processing / analysing data and then publishing results and sharing with others.
Medicine does, however, provide some clear reasons why we can’t just stick all research data on the internet for anyone to trawl through. When human subjects are involved there are real concerns about confidentiality. Yet what alltrials.net and other initiatives make clear is that the *existence* of the data should never be hidden. That allows it to be discovered and for negotiations to take place about its use. It avoids costly replication, which can delay scientific discovery and involve human suffering when the replication takes the form of a clinical trial.
Did I mention that making data available increases citations? This is a win all round. If you don’t believe me, here are three studies from three different areas that all show robust, positive correlations. The effect size varies with discipline, but we have enough evidence now that anyone who says that their area is different needs to come up with evidence to show why.
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.