Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Inverting the Pyramid:
Maximising the value of
research data to society
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY
The DCC is supported by Jisc

My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 2

DCC networks and partnerships
Original Slide:
Martin Donnelly,
DCC

About me
• 35 years ago – a mathematician in medical
research
• Acquired a skill for rescuing old data:
– Lost code books
– Lost programs
– Bad or obsolete media or systems
• It was fun – but it should not have been
necessary

My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures

Generic science data lifecycle
PLAN COLLECT INTEGRATE/
TRANSFORM
PUBLISH DISCOVER ARCHIVE/
DISCARD
Adapted from: Harnessing the Power of Digital Data: Taking the Next Step.‖
Scientific Data Management (SDM) for Government Agencies:
Report from the Workshop to Improve SDM.

E-Science curation report - 2003

Herve L’Hour’s analysis
• Data lifecycles are linear, cyclical or spiral
(sometimes all three)
• See more at
http://www.dcc.ac.uk/events/research-data-management-
forum-rdmf/rdmf11 - workflows
& research data management
• Linear cycles are project-based or repository-based

Traditional knowledge management
view of data
Image © John Curran @
designedforlearning.co.uk
Image from forwardmotion.eu

But in research…
"DIKW-diagram" by RobOnKnowledge - Own work. Licensed under
Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:DIKW-diagram.
png#mediaviewer/File:DIKW-diagram.png

I ♥ your data!
I don’t ♥ what you said
about it.

LIDAR & RADAR images of ice cloud –
H. Ruschennberg
2014-11-25
Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY
14

2014-11-25
CC-BY
15
The Old
weather
project
Data for
research,
not from
research

Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull

Data reuse - messages
Often your data tells
stories that your
publications do not
Not all data comes from
other researchers
Discipline-bounded data
discovery doesn’t give us
all we need or want
One person’s noise is
another person’s signal

Understanding Biodiversity
• We don’t understand what drives it
• What helps, hinders speciation
• No one project or data source is enough
• Biology, geology, climate science, chemistry…
• Big and small problems
• Reanalysis & gap analysis

Research on Biodiversity…
• Requires many different data sources
• Not all will be published
• Not all publications are for similar research
reasons, so…
• Citing the publication is irrelevant
• Some is research data, other government or
reference data

Why care?
• Data is expensive – an investment
• Reuse:
– More research
– Teaching & Learning
– Planning
• Impact – with or without publication
• Accountability
• Legal & regulatory requirements

Why does this matter?
• Research quality
– How close can we get to
the truth?
• Research speed
– How quickly can we get
to the truth?
• Research finance
– How much does the
truth cost?
• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society
2014-11-25
CC-BY
21

Creative data reuse
• http://vimeo.com/38402965

Integrity – not without data
• Cyril Burt
– Twin studies on intelligence.
– Questioned 1976; now discredited
• Duke case
– Data hiding leads to wasted treatments, clinical
trials, probable death & huge lawsuits
• Dutch cases
– Stapel – 55 publications – “fictitious data”
– Poldermans – fabricated data or negligence?
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials
“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23

Without data reuse:
•We can waste billions
• People suffer & die

Data reuse from Hubble

Data reuse is already
happening – and
researchers can change

Where can it happen
Global, international
Nationally
By Subject Institution
Research Group

2014-11-25
CC-BY
28

Research data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
• Returns on investment between 400% and
1200%
• Unfortunately – many research domains have
no relevant data centres
http://www.jisc.ac.uk/whatwedo/programmes/di_dir
ections/strategicdirections/badc.aspx

“Provision for data management, for
curation and long-term preservation, and
for the sharing and re-use of data, varies
wildly between subject areas.”
“The data management needs of many
researchers are little considered or catered for.”
If greater provision is to be
made, a shortfall in
infrastructure (both technical
and human) must be
overcome.
Policy makers are aware that
in many areas of enquiry,
researchers’ access to well-managed,
open and reusable
data opens up significant
opportunities.
2014-11-25
CC-BY
30
All from JISC MRD 2
call, 2010

2014-11-25
CC-BY
31

2014-11-25
CC-BY
32

The library as custodian
• Increasing role for library to provide access to
institutional assets
• See Lorcan Dempsey’s thoughts on the inside-out
library vs outside-in library
– http://www.slideshare.net/lisld/the-inside-out-library
• Build on library strengths – preservation,
access, curation, selection

G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-11-25
CC-BY
34

Funder requirements
http://www.epsrc.ac.uk/abo
ut/standards/researchdata/P
ages/policyframework.aspx
UK - RCUK
Canada
UK - RCUK
USA – NSF,
Denmark NEH, etc
USA – non-government
funders (Sloan,
Gates,…)
Europe

RCUK policy - The 1-minute version
• Research data are a public good – make openly
available in timely & responsible way
• Have policies & plans. Data with long-term value
should be preserved & usable
• Metadata for discovery & reuse. Link publications &
data
• Sometimes law, ethics get in the way. We understand.
• Limited embargos OK. Recognition is important –
always cite data sources
• OK to use public money to do this. Do it efficiently.

EPSRC policy points
• Awareness of regulatory environment
• Data access statement
• Policies and processes
Compliance
• Data storage
expected by 2015
• Structured metadata descriptions
• DOIs for data
• Securely preserved for a minimum of 10 years
from last use
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY

2014-11-25
CC-BY
DCC Policy
Summary
38
http://www.dcc.ac.uk/resources/policy-and-legal

Helping make data reuse possible –
experience from the DCC

Some lessons – a summary
• Data reuse is rarely as simple as people think it is
• It is already happening
• It is good for research, for researchers, for funders, for
universities
• Without senior management attention and researcher
involvement, your initiative will fail
• Research data management services cannot involve the
library alone
• Researchers need to know your services exist
• Training for young researchers in good data practice is
valuable

DCC ‘institutional engagement’
Assess
needs
Make the case
Develop
support and
services
RDM policy
development
Customised Data
Management Plans
DAF & CARDIO
assessments
Guidance and
training
Workflow
assessment
DCC
support
team
Advocacy with senior
management
Institutional
data catalogues
Pilot RDM
tools
Original Slide:
Graham Pryor,
DCC
…and support policy implementation
2014-11-25
CC-BY
41

Some institutional roles
• Leadership – coordinate action
• Audit – who has what, where does it go?
• Advice on access – data, wherever it is
• Preservation – permanence
• Citability
• Data/publication linking
• Promoting data in teaching
• Selection
• Education – early career researchers

Who (in the UK) is leading RDM work?
RESEARCHERS
Library
IT
Research
Office

INSTITUTIONAL SERVICES

Some example services
• Storage – persistent, shareable
• Permanent, citeable identifiers
• Database as a service (e.g. Oxford ORDS)
• Embed tools in Excel – Dataup, others
• Workflow management – Taverna
• Training for early career researchers

Make data creation easier

Make data citable
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1

Make data discoverable
• Data must be discoverable to be reused
• Alone, or in conjunction with publication
• Services include:
– Institutional catalogues
– national data registries
– Repository registries – databib, re3data

Dataverse –
helping
researchers
make data
findable &
reusable
Gking.harvard.edu/data

DCC guidance

http://dataintelligence.3tu.nl/en/home/
Choice of RDM training
materials for librarians
Up-skilling
for data
http://datalib.edina.ac.uk/mantra/libtraining.html
2014-11-25
CC-BY
51

What data to keep
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52

The Data Deluge is upon us
Sensor’s ability
to produce data
outstrips IT’s
ability to
process it

Roles and
Responsibilities
What data to keep
2014-11-25
CC-BY
54

IDCC15 – London, Feb 9-12 2015
The 10th
International
Digital
Curation
Conference
http://www.dcc.ac.uk/events/idcc15

My message to researchers
• The credit belongs to you
• The data belongs to all of us
• Share, and we all reap the
benefits
• The story doesn’t end with a
publication

Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Similar to Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote) (20)

More from Kevin Ashley

More from Kevin Ashley (7)

Recently uploaded

Recently uploaded (20)

Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Editor's Notes