Use and reuse: research data locally & globally #esipfed

USE AND REUSE
Research data locally and globally
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY

The DCC is supported by Jisc & FP7

Why does this matter?
• Research quality
– How close can we get to
the truth?

• Research speed
– How quickly can we get
to the truth?

• Research finance
– How much does the
truth cost?

2014-01-08

• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society

Kevin Ashley – ESIP Winter 2014 - CC-BY

2

The Data Deluge is upon us
Sensor’s ability
to produce data
outstrips IT’s
ability to
process it

2014-01-08


3

Funders are making demands

2014-01-08


4

EPSRC expects all those institutions it funds
to develop a roadmap that aligns … with
EPSRC’s expectations by 1st May 2012;
to be fully compliant … by 1st May 2015.

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx

2014-01-08

Kevin Ashley – ESIP Winter 2014 CC-BY

5

•
•
•
•
•
•
•

Awareness of regulatory environment
Data access statement
Policies and processes
Data storage
Structured metadata descriptions
DOIs for data
Securely preserved for a minimum of 10 years
from last use

2014-01-08

Kevin Ashley – ESIP Winter 2014 CC-BY

6

Where are funders making demands?
•
•
•
•

USA – NSF, NEH, some philanthropic funders
UK
Germany – DFG
Europe – European Commission (H2020)

Often tied to requirements on open access to
research publications – but not as common.

2014-01-08


7

To universities, that looks like a
problem
• Funder requirements exist for a reason:
– That data is valuable

• Value to funder, society from reuse
• Value to the institution is there also

BIS business case: £1.5m investment in research data
services pays back 2.5 times after 5 years

2014-01-08


8

Research Data Centres – the solution!

MANY AREAS OF
RESEARCH HAVE NO
DATA CENTRE TO
SERVE THEM
2014-01-08


9

Data 1200%
Want a 400% -> centres
return on your
investment?

deliver value
Try BADC!

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
2014-01-08


10

Data reuse from Hubble

2014-01-08


11

Don’t trust
government

http://thetyee.ca/News/2013/12/23/Canadian-Science-Libraries/
2014-01-08


12

Commercial services

2014-01-08


13

Cloud – sorted!
• Sorry, but it isn’t.
• High-use datasets and long tail present
different economic and technical challenges
• See David Rosenthal’s analysis of the
economics of Amazon for preservation
“Distributed digital preservation in the cloud”
IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248

2014-01-08


14

Cost of data for 100 years – local vs Amazon S3
Data from blog.dshr.org/2013/01/talk-at-idcc2013.html
© David Rosenthal, used under CC-BY-SA licence
2014-01-08


15

Cost of data for 100 years – local vs Amazon S3 AND Glacier
Data from blog.dshr.org/2013/01/talk-at-idcc2013.html
© David Rosenthal, used under CC-BY-SA licence
2014-01-08


16

National responses – supporting
universities
• USA – NSF initiatives (DataONE, SEAD, Data
Conservancy et al)
• Australia – ANDS, RDSI
• UK – DCC, Jisc ‘Managing Research Data’
programmes
• Netherlands – Research Data Netherlands
• Canada – Research Data Canada
• Also grassroots or funder-led work in Finland,
Denmark, Germany
2014-01-08


17

UK- Jisc acts through DCC to help

2014-01-08


18

DCC ‘institutional engagement’
Assess
needs

Institutional
data catalogues
Workflow
assessment

DAF & CARDIO
assessments
Advocacy with senior
management

Make the case

Pilot RDM
tools

DCC
support
team

Guidance and
training

Develop
support and
services

RDM policy
development

Customised Data
Management Plans
…and support policy implementation

2014-01-08


19

http://datalib.edina.ac.uk/mantra/libtraining.html

Up-skilling
for data

Choice of RDM training
materials for librarians
2014-01-08


http://dataintelligence.3tu.nl/en/home/

20

Australian National Data Service
2014-01-08


National Service, backed
with university-level
initiatives
21

Excuses – and responses
• “People will ask questions”
– So use a data centre or repository

• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction

• “It’s not interesting”
– Let others be the judge – your noise is my signal

• “I might get another paper out of it”
– Up to a point. We might get more research out of it

• “I don’t have permission”
– A real problem. But solvable at senior level

• “It’s too bad/complicated” –see above
• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the
evidence, it would be your priority as well

2014-01-08

See e.g. Carly Strasser’s blog:
http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
22

These excuses bear a strong
resemblance to those used by
politicians and civil servants who argue
against the release of government
records

This is not a group you want to be
compared with
2014-01-08


23

Integrity
• Not everyone publishes
here
• Almost all fraud
connected to
unavailable data
• People suffer & die due
to research fraud
• When your research is
reproducible – it gets
cited
2014-01-08


24

Integrity – not without data
• Cyril Burt
– Twin studies on intelligence.
– Questioned 1976; now discredited

• Duke case
– Data hiding leads to wasted treatments, clinical
trials, probable death & huge lawsuits

• Dutch cases
– Stapel – 55 publications – “fictitious data”
– Poldermans – fabricated data or negligence?
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials
“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
2014-01-08


25

Should all data be open?
• NO
• Many reasons – most to do with human
subjects
• But data existence should always be open
• Allows discovery & negotiation on use
• Avoids pointless replication

2014-01-08


26

Gentleman’s data centres
• Some data centres have club-like behaviour
– Barriers to access
– Only for contributors
– Territorial

• Not without value, but barriers to progress

2014-01-08


27

Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
2014-01-08
28

Can we find it?
• Data must be discoverable to be reused
• Alone, or in conjunction with publication
• Institutional catalogues, national data
registries, national and international domainspecific services

2014-01-08


29

Data discovery around the world
• Research Data Australia
• UK data registry pilot &
Gateway2Research
• Research Data
Netherlands
• World Data System
• re3data.org &
databib.org –
discovering repositories
2014-01-08


30

Repository finders

A re3data record

2014-01-08


31

A databib
record

2014-01-08


32

2014-01-08


33

2014-01-08


34

Other global work of note
• Domain initiatives such as Belmont forum
• International generic groups – RDA, CODATA
• Problem-specific services – Datacite, EZID,…

2014-01-08


35

Idea
Read

Develop

Publish

Fund

Process

Plan
Record

2014-01-08


36

Idea
Read

Develop

Idea
Read
Fund

Publish

Develop

Publish
Plan

Process

Fund

Record

Process

Plan
Record

2014-01-08


37

Idea
Read

Develop

Publish

Fund

Process

Plan
Record

2014-01-08


38

Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull
• The 19th-century logs and photographs that
help us model climate change
Often your data tells
stories that your
publications do not
2014-01-08


39

3TU treasure chest
2014-01-08


40

Thanks for your attention
kevin.ashley@ed.ac.uk
www.dcc.ac.uk
@kevingashley

2014-01-08


41

Use and reuse: research data locally & globally #esipfed

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (8)

Similar to Use and reuse: research data locally & globally #esipfed

Similar to Use and reuse: research data locally & globally #esipfed (20)

More from Kevin Ashley

More from Kevin Ashley (9)

Recently uploaded

Recently uploaded (20)

Use and reuse: research data locally & globally #esipfed

Editor's Notes