SlideShare a Scribd company logo
1 of 41
USE AND REUSE
Research data locally and globally
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY

The DCC is supported by Jisc & FP7
Why does this matter?
• Research quality
– How close can we get to
the truth?

• Research speed
– How quickly can we get
to the truth?

• Research finance
– How much does the
truth cost?

2014-01-08

• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society

Kevin Ashley – ESIP Winter 2014 - CC-BY

2
The Data Deluge is upon us
Sensor’s ability
to produce data
outstrips IT’s
ability to
process it

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

3
Funders are making demands

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

4
EPSRC expects all those institutions it funds
to develop a roadmap that aligns … with
EPSRC’s expectations by 1st May 2012;
to be fully compliant … by 1st May 2015.

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx

2014-01-08

Kevin Ashley – ESIP Winter 2014 CC-BY

5
•
•
•
•
•
•
•

Awareness of regulatory environment
Data access statement
Policies and processes
Data storage
Structured metadata descriptions
DOIs for data
Securely preserved for a minimum of 10 years
from last use

2014-01-08

Kevin Ashley – ESIP Winter 2014 CC-BY

6
Where are funders making demands?
•
•
•
•

USA – NSF, NEH, some philanthropic funders
UK
Germany – DFG
Europe – European Commission (H2020)

Often tied to requirements on open access to
research publications – but not as common.

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

7
To universities, that looks like a
problem
• Funder requirements exist for a reason:
– That data is valuable

• Value to funder, society from reuse
• Value to the institution is there also

BIS business case: £1.5m investment in research data
services pays back 2.5 times after 5 years

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

8
Research Data Centres – the solution!

MANY AREAS OF
RESEARCH HAVE NO
DATA CENTRE TO
SERVE THEM
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

9
Data 1200%
Want a 400% -> centres
return on your
investment?

deliver value
Try BADC!

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

10
Data reuse from Hubble

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

11
Don’t trust
government

http://thetyee.ca/News/2013/12/23/Canadian-Science-Libraries/
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

12
Commercial services

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

13
Cloud – sorted!
• Sorry, but it isn’t.
• High-use datasets and long tail present
different economic and technical challenges
• See David Rosenthal’s analysis of the
economics of Amazon for preservation
“Distributed digital preservation in the cloud”
IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

14
Cost of data for 100 years – local vs Amazon S3
Data from blog.dshr.org/2013/01/talk-at-idcc2013.html
© David Rosenthal, used under CC-BY-SA licence
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

15
Cost of data for 100 years – local vs Amazon S3 AND Glacier
Data from blog.dshr.org/2013/01/talk-at-idcc2013.html
© David Rosenthal, used under CC-BY-SA licence
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

16
National responses – supporting
universities
• USA – NSF initiatives (DataONE, SEAD, Data
Conservancy et al)
• Australia – ANDS, RDSI
• UK – DCC, Jisc ‘Managing Research Data’
programmes
• Netherlands – Research Data Netherlands
• Canada – Research Data Canada
• Also grassroots or funder-led work in Finland,
Denmark, Germany
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

17
UK- Jisc acts through DCC to help

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

18
DCC ‘institutional engagement’
Assess
needs

Institutional
data catalogues
Workflow
assessment

DAF & CARDIO
assessments
Advocacy with senior
management

Make the case

Pilot RDM
tools

DCC
support
team

Guidance and
training

Develop
support and
services

RDM policy
development

Customised Data
Management Plans
…and support policy implementation

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

19
http://datalib.edina.ac.uk/mantra/libtraining.html

Up-skilling
for data

Choice of RDM training
materials for librarians
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

http://dataintelligence.3tu.nl/en/home/

20
Australian National Data Service
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

National Service, backed
with university-level
initiatives
21
Excuses – and responses
• “People will ask questions”
– So use a data centre or repository

• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction

• “It’s not interesting”
– Let others be the judge – your noise is my signal

• “I might get another paper out of it”
– Up to a point. We might get more research out of it

• “I don’t have permission”
– A real problem. But solvable at senior level

• “It’s too bad/complicated” –see above
• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the
evidence, it would be your priority as well

2014-01-08

See e.g. Carly Strasser’s blog:
http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
22
Kevin Ashley – ESIP Winter 2014 - CC-BY
These excuses bear a strong
resemblance to those used by
politicians and civil servants who argue
against the release of government
records

This is not a group you want to be
compared with
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

23
Integrity
• Not everyone publishes
here
• Almost all fraud
connected to
unavailable data
• People suffer & die due
to research fraud
• When your research is
reproducible – it gets
cited
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

24
Integrity – not without data
• Cyril Burt
– Twin studies on intelligence.
– Questioned 1976; now discredited

• Duke case
– Data hiding leads to wasted treatments, clinical
trials, probable death & huge lawsuits

• Dutch cases
– Stapel – 55 publications – “fictitious data”
– Poldermans – fabricated data or negligence?
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials
“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

25
Should all data be open?
• NO
• Many reasons – most to do with human
subjects
• But data existence should always be open
• Allows discovery & negotiation on use
• Avoids pointless replication

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

26
Gentleman’s data centres
• Some data centres have club-like behaviour
– Barriers to access
– Only for contributors
– Territorial

• Not without value, but barriers to progress

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

27
Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
2014-01-08
28
Kevin Ashley – ESIP Winter 2014 - CC-BY
Can we find it?
• Data must be discoverable to be reused
• Alone, or in conjunction with publication
• Institutional catalogues, national data
registries, national and international domainspecific services

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

29
Data discovery around the world
• Research Data Australia
• UK data registry pilot &
Gateway2Research
• Research Data
Netherlands
• World Data System
• re3data.org &
databib.org –
discovering repositories
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

30
Repository finders

A re3data record

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

31
A databib
record

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

32
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

33
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

34
Other global work of note
• Domain initiatives such as Belmont forum
• International generic groups – RDA, CODATA
• Problem-specific services – Datacite, EZID,…

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

35
Idea
Read

Develop

Publish

Fund

Process

Plan
Record

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

36
Idea
Read

Develop

Idea
Read
Fund

Publish

Develop

Publish
Plan

Process

Fund

Record

Process

Plan
Record

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

37
Idea
Read

Develop

Publish

Fund

Process

Plan
Record

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

38
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull
• The 19th-century logs and photographs that
help us model climate change
Often your data tells
stories that your
publications do not
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

39
3TU treasure chest
2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

40
Thanks for your attention
kevin.ashley@ed.ac.uk
www.dcc.ac.uk
@kevingashley

2014-01-08

Kevin Ashley – ESIP Winter 2014 - CC-BY

41

More Related Content

What's hot

BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014Ari Berman
 
Conceptualisations of LIS research impact and value: Learning from the LIS R...
Conceptualisations of LIS research impact and value:  Learning from the LIS R...Conceptualisations of LIS research impact and value:  Learning from the LIS R...
Conceptualisations of LIS research impact and value: Learning from the LIS R...Hazel Hall
 
Collaboration and networking: learning from DREaM and RIVAL
Collaboration and networking: learning from DREaM and RIVALCollaboration and networking: learning from DREaM and RIVAL
Collaboration and networking: learning from DREaM and RIVALHazel Hall
 
SGCI - Science Gateways: An Overview
SGCI - Science Gateways: An OverviewSGCI - Science Gateways: An Overview
SGCI - Science Gateways: An OverviewSandra Gesing
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedPhilip Bourne
 
AirBox: a participatory ecosystem for PM2.5 monitoring
AirBox: a participatory ecosystem for PM2.5 monitoringAirBox: a participatory ecosystem for PM2.5 monitoring
AirBox: a participatory ecosystem for PM2.5 monitoringLing-Jyh Chen
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06SayDotCom.com
 
Systems and Services: Adding Value For Research Data Assets
Systems and Services: Adding Value For Research Data AssetsSystems and Services: Adding Value For Research Data Assets
Systems and Services: Adding Value For Research Data AssetsLIBER Europe
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataSemantic Web Company
 
Parallel session: security
Parallel session: securityParallel session: security
Parallel session: securityJisc
 
Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016Jisc
 
Ppt5 exp lonodn - kevin cope & alex yakimov ( imperial college ) data cent...
Ppt5   exp lonodn - kevin cope & alex yakimov ( imperial college )  data cent...Ppt5   exp lonodn - kevin cope & alex yakimov ( imperial college )  data cent...
Ppt5 exp lonodn - kevin cope & alex yakimov ( imperial college ) data cent...JISC's Green ICT Programme
 

What's hot (14)

BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
Conceptualisations of LIS research impact and value: Learning from the LIS R...
Conceptualisations of LIS research impact and value:  Learning from the LIS R...Conceptualisations of LIS research impact and value:  Learning from the LIS R...
Conceptualisations of LIS research impact and value: Learning from the LIS R...
 
Collaboration and networking: learning from DREaM and RIVAL
Collaboration and networking: learning from DREaM and RIVALCollaboration and networking: learning from DREaM and RIVAL
Collaboration and networking: learning from DREaM and RIVAL
 
SGCI - Science Gateways: An Overview
SGCI - Science Gateways: An OverviewSGCI - Science Gateways: An Overview
SGCI - Science Gateways: An Overview
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
 
AirBox: a participatory ecosystem for PM2.5 monitoring
AirBox: a participatory ecosystem for PM2.5 monitoringAirBox: a participatory ecosystem for PM2.5 monitoring
AirBox: a participatory ecosystem for PM2.5 monitoring
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 
RDA Governance
RDA GovernanceRDA Governance
RDA Governance
 
Systems and Services: Adding Value For Research Data Assets
Systems and Services: Adding Value For Research Data AssetsSystems and Services: Adding Value For Research Data Assets
Systems and Services: Adding Value For Research Data Assets
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked Data
 
Data Science Tools
Data Science ToolsData Science Tools
Data Science Tools
 
Parallel session: security
Parallel session: securityParallel session: security
Parallel session: security
 
Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016
 
Ppt5 exp lonodn - kevin cope & alex yakimov ( imperial college ) data cent...
Ppt5   exp lonodn - kevin cope & alex yakimov ( imperial college )  data cent...Ppt5   exp lonodn - kevin cope & alex yakimov ( imperial college )  data cent...
Ppt5 exp lonodn - kevin cope & alex yakimov ( imperial college ) data cent...
 

Viewers also liked

H2020 data pilot openaire
H2020 data pilot openaireH2020 data pilot openaire
H2020 data pilot openaireSarah Jones
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 
Disciplinary RDM
Disciplinary RDMDisciplinary RDM
Disciplinary RDMSarah Jones
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data ThingsKatina Toufexis
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009Kevin Ashley
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 

Viewers also liked (8)

H2020 data pilot openaire
H2020 data pilot openaireH2020 data pilot openaire
H2020 data pilot openaire
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
DMPonline demo
DMPonline demoDMPonline demo
DMPonline demo
 
Disciplinary RDM
Disciplinary RDMDisciplinary RDM
Disciplinary RDM
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
Open Science
Open ScienceOpen Science
Open Science
 

Similar to Use and reuse: research data locally & globally #esipfed

Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...Kevin Ashley
 
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Jisc
 
Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15Kevin Ashley
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria ResearchKevin Ashley
 
National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)Kevin Ashley
 
Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)Kevin Ashley
 
Towards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterpriseTowards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterprisePhilip Bourne
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Dataariadnenetwork
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data ServicesICPSR
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14Phillip Delaney
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECAProject
 
Trust: when we need it and how to get it
Trust: when we need it and how to get itTrust: when we need it and how to get it
Trust: when we need it and how to get itKevin Ashley
 
Shared research data management service
Shared research data management serviceShared research data management service
Shared research data management serviceJisc RDM
 
Research data for repository managers
Research data for repository managers Research data for repository managers
Research data for repository managers Kevin Ashley
 
DCC's role in the UMF Programme
DCC's role in the UMF ProgrammeDCC's role in the UMF Programme
DCC's role in the UMF ProgrammeEduserv
 
An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...Kevin Ashley
 
Research Data Australia - the national research data catalogue
Research Data Australia - the national research data catalogueResearch Data Australia - the national research data catalogue
Research Data Australia - the national research data catalogueRichard Ferrers
 

Similar to Use and reuse: research data locally & globally #esipfed (20)

Open data for open scholarship - where we are
Open data for open scholarship - where we areOpen data for open scholarship - where we are
Open data for open scholarship - where we are
 
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
 
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
 
Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria Research
 
National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)
 
Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)
 
Towards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterpriseTowards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital Enterprise
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data Services
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
Trust: when we need it and how to get it
Trust: when we need it and how to get itTrust: when we need it and how to get it
Trust: when we need it and how to get it
 
Shared research data management service
Shared research data management serviceShared research data management service
Shared research data management service
 
Research data for repository managers
Research data for repository managers Research data for repository managers
Research data for repository managers
 
DCC's role in the UMF Programme
DCC's role in the UMF ProgrammeDCC's role in the UMF Programme
DCC's role in the UMF Programme
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Wiser2009 Luis Martinez
Wiser2009 Luis MartinezWiser2009 Luis Martinez
Wiser2009 Luis Martinez
 
An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...
 
Research Data Australia - the national research data catalogue
Research Data Australia - the national research data catalogueResearch Data Australia - the national research data catalogue
Research Data Australia - the national research data catalogue
 

More from Kevin Ashley

RISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkRISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkKevin Ashley
 
Data Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewData Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewKevin Ashley
 
Data and the webmanager
Data and the webmanagerData and the webmanager
Data and the webmanagerKevin Ashley
 
Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Kevin Ashley
 
Missing links closing talk - with notes
Missing links closing talk - with notesMissing links closing talk - with notes
Missing links closing talk - with notesKevin Ashley
 
What can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowWhat can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowKevin Ashley
 
Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Kevin Ashley
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challengesKevin Ashley
 
ipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programmeipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training ProgrammeKevin Ashley
 

More from Kevin Ashley (9)

RISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkRISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation Framework
 
Data Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewData Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal view
 
Data and the webmanager
Data and the webmanagerData and the webmanager
Data and the webmanager
 
Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)
 
Missing links closing talk - with notes
Missing links closing talk - with notesMissing links closing talk - with notes
Missing links closing talk - with notes
 
What can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowWhat can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield Roadshow
 
Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challenges
 
ipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programmeipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programme
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Use and reuse: research data locally & globally #esipfed

  • 1. USE AND REUSE Research data locally and globally Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc & FP7
  • 2. Why does this matter? • Research quality – How close can we get to the truth? • Research speed – How quickly can we get to the truth? • Research finance – How much does the truth cost? 2014-01-08 • Improving one or more of these is of interest to all actors: • Researchers as data creators • Researchers as data reusers • Research institutions • Funders – hence government and society Kevin Ashley – ESIP Winter 2014 - CC-BY 2
  • 3. The Data Deluge is upon us Sensor’s ability to produce data outstrips IT’s ability to process it 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 3
  • 4. Funders are making demands 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 4
  • 5. EPSRC expects all those institutions it funds to develop a roadmap that aligns … with EPSRC’s expectations by 1st May 2012; to be fully compliant … by 1st May 2015. http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx 2014-01-08 Kevin Ashley – ESIP Winter 2014 CC-BY 5
  • 6. • • • • • • • Awareness of regulatory environment Data access statement Policies and processes Data storage Structured metadata descriptions DOIs for data Securely preserved for a minimum of 10 years from last use 2014-01-08 Kevin Ashley – ESIP Winter 2014 CC-BY 6
  • 7. Where are funders making demands? • • • • USA – NSF, NEH, some philanthropic funders UK Germany – DFG Europe – European Commission (H2020) Often tied to requirements on open access to research publications – but not as common. 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 7
  • 8. To universities, that looks like a problem • Funder requirements exist for a reason: – That data is valuable • Value to funder, society from reuse • Value to the institution is there also BIS business case: £1.5m investment in research data services pays back 2.5 times after 5 years 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 8
  • 9. Research Data Centres – the solution! MANY AREAS OF RESEARCH HAVE NO DATA CENTRE TO SERVE THEM 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 9
  • 10. Data 1200% Want a 400% -> centres return on your investment? deliver value Try BADC! http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 10
  • 11. Data reuse from Hubble 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 11
  • 13. Commercial services 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 13
  • 14. Cloud – sorted! • Sorry, but it isn’t. • High-use datasets and long tail present different economic and technical challenges • See David Rosenthal’s analysis of the economics of Amazon for preservation “Distributed digital preservation in the cloud” IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 14
  • 15. Cost of data for 100 years – local vs Amazon S3 Data from blog.dshr.org/2013/01/talk-at-idcc2013.html © David Rosenthal, used under CC-BY-SA licence 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 15
  • 16. Cost of data for 100 years – local vs Amazon S3 AND Glacier Data from blog.dshr.org/2013/01/talk-at-idcc2013.html © David Rosenthal, used under CC-BY-SA licence 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 16
  • 17. National responses – supporting universities • USA – NSF initiatives (DataONE, SEAD, Data Conservancy et al) • Australia – ANDS, RDSI • UK – DCC, Jisc ‘Managing Research Data’ programmes • Netherlands – Research Data Netherlands • Canada – Research Data Canada • Also grassroots or funder-led work in Finland, Denmark, Germany 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 17
  • 18. UK- Jisc acts through DCC to help 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 18
  • 19. DCC ‘institutional engagement’ Assess needs Institutional data catalogues Workflow assessment DAF & CARDIO assessments Advocacy with senior management Make the case Pilot RDM tools DCC support team Guidance and training Develop support and services RDM policy development Customised Data Management Plans …and support policy implementation 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 19
  • 20. http://datalib.edina.ac.uk/mantra/libtraining.html Up-skilling for data Choice of RDM training materials for librarians 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY http://dataintelligence.3tu.nl/en/home/ 20
  • 21. Australian National Data Service 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY National Service, backed with university-level initiatives 21
  • 22. Excuses – and responses • “People will ask questions” – So use a data centre or repository • “It will be misinterpreted” – Stuff happens. Also, openness encourages correction • “It’s not interesting” – Let others be the judge – your noise is my signal • “I might get another paper out of it” – Up to a point. We might get more research out of it • “I don’t have permission” – A real problem. But solvable at senior level • “It’s too bad/complicated” –see above • “It’s not a priority” – Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well 2014-01-08 See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/ 22 Kevin Ashley – ESIP Winter 2014 - CC-BY
  • 23. These excuses bear a strong resemblance to those used by politicians and civil servants who argue against the release of government records This is not a group you want to be compared with 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23
  • 24. Integrity • Not everyone publishes here • Almost all fraud connected to unavailable data • People suffer & die due to research fraud • When your research is reproducible – it gets cited 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 24
  • 25. Integrity – not without data • Cyril Burt – Twin studies on intelligence. – Questioned 1976; now discredited • Duke case – Data hiding leads to wasted treatments, clinical trials, probable death & huge lawsuits • Dutch cases – Stapel – 55 publications – “fictitious data” – Poldermans – fabricated data or negligence? “The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials “Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 25
  • 26. Should all data be open? • NO • Many reasons – most to do with human subjects • But data existence should always be open • Allows discovery & negotiation on use • Avoids pointless replication 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 26
  • 27. Gentleman’s data centres • Some data centres have club-like behaviour – Barriers to access – Only for contributors – Territorial • Not without value, but barriers to progress 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 27
  • 28. Citability • Making data available increases citations • Everyone – academic, funder, institution – loves citations • Want evidence? – Alter, Pienta, Lyle – 240%, social sciences * – Piwowar, Vision – 9% (microarray data)† – Henneken, Accomazzi – 20% (astronomy) # # Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618 * Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 † Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 2014-01-08 28 Kevin Ashley – ESIP Winter 2014 - CC-BY
  • 29. Can we find it? • Data must be discoverable to be reused • Alone, or in conjunction with publication • Institutional catalogues, national data registries, national and international domainspecific services 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 29
  • 30. Data discovery around the world • Research Data Australia • UK data registry pilot & Gateway2Research • Research Data Netherlands • World Data System • re3data.org & databib.org – discovering repositories 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 30
  • 31. Repository finders A re3data record 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 31
  • 32. A databib record 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 32
  • 33. 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 33
  • 34. 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 34
  • 35. Other global work of note • Domain initiatives such as Belmont forum • International generic groups – RDA, CODATA • Problem-specific services – Datacite, EZID,… 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 35
  • 39. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The ‘noise’ from research radar that mapped dust from Eyjafjallajökull • The 19th-century logs and photographs that help us model climate change Often your data tells stories that your publications do not 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 39
  • 40. 3TU treasure chest 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 40
  • 41. Thanks for your attention kevin.ashley@ed.ac.uk www.dcc.ac.uk @kevingashley 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 41

Editor's Notes

  1. I’m from the digital curation centre in the UK – for those of you who haven’t heard of us I’ll be explaining a little of what we do later on in this talk. I’m here today to talk about one of the things that is central to ESIP’s existence, the effective management and reuse of research data. But I’m not going to talk about the earth sciences specifically, but look more generally at what is happening at local, national and global level to ensure that research data is used and reused effectively.
  2. For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers.Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others.Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money.And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.
  3. Getting better at managing research data isn’t just about keeping more stuff for longer. It’s also about being more selective about what we do keep and documenting the decision-making process that we use. Reports such as this make clear that technological advances means that the cost of producing data is dropping more rapidly than the cost of retention. Some arguments show that if we attempt to retain everything it won’t be long before we’re spending the entire GDP purely on data storage. That’s an extreme analysis, but the problem is real as CERN know well. In some disciplines it really is wiser to just generate the data again when it is needed. But for many observational disciplines, that opportunity isn’t open to us.
  4. Funders are aware of this and making increasingly stringent requirements about what researchers do with data and how and when they document it. The UK’s NERC runs a network of data centres to capture much of the data from research which it funds as well as providing data from the larger instruments it is responsible for. It also requires data management plans – an outline with the proposal and a more fully worked-up plan if the proposal is successful. The details differ with other funders but all are moving or have moved in this direction.
  5. Some, such as EPSRC in the UK, have taken a slightly different tack. They place the burden of compliance on the university rather than the researcher. They expect universities to provide appropriate services to researchers to enable them to do the Right Thing, whatever that is. Looming deadlines in 2012and in 2015 got the attention of senior university management. EPSRC are the biggest research funder in the UK. No one wants to put that funding stream at risk.
  6. The expectations that universities need to sign up are listed here – their roadmaps need to demonstrate how they are going to deliver on these expectations by 2015. They include a commitment to keep data for 10 years after its last use – note, not just after the project ends. Some worry that this means they need to keep data for 100 years. I say that if your data is still being used (and cited) 100 years later you should break out the champagne, not worry about paying for it.
  7. Those are UK examples and many of you will be familiar with parallel requirements in the USA. In Germany, DFG is making similar requirements and the European Commission’s new Horizon 2020 programme has also included requirements about research data for the first time. Government funders are often driven by the same ideas that are pushing increasing openness with data from all areas of public activity, much of it administrative. The G8 have issued strongly-worded statements in this area. But health funders in particular are often charitable rather than government funded. They often began with requirements about open access to publications arising from publicly-funded research. Data is an obvious next step.
  8. They all have similar motivations, but value alone is key. Data costs money – it’s an asset. And we want to sweat that asset to the greatest extent possible. The business case for some of the DCC’s activity, accepted by the UK Treasury, foresaw a return on a modest investment of 2.5 times after 5 years – which then continues indefinitely.
  9. Some people felt that disciplinary data centres were the answer to all this. There are lots of them after all – these are just a few UK examples. But many disciplines don’t have them and they aren’t easy to create. There is therefore an ongoing role for someone else to have custodial responsibility for much research data and universities and other research institutions are the natural home for much of it. National libraries in some countries also see a role for themselves.
  10. Some recent studies have used rigorous methodologies to examine the cost-effectiveness of disciplinary data archives or repositories. This most recent one shows impressive returns on the amount spent on BADC – rates of return that would be highly attractive were this a commercial venture. But it isn’t, of course. The financial benefit flows to the community as a whole, not to the data centre which is simply a cost we bear to save overall. This observation, incidentally, is equally applicable no matter how youchoose to spell ‘center’.
  11. Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data. I could be more specific about this if the data behind this graph was made available, incidentally!
  12. But we do need to be beware of dependence on a single custodian for any set of data. This recent news story contains speculation that political motives are behind the loss of much material from research libraries in Canada, which includes much pre-digital data. The story isn’t without controversy, but it is only one example from many around the world.
  13. Commercial actors are also entering the scene either to provide services to universities or research groups or direct to researchers. Arkivum falls into the former group; figshare began with the latter but is also now moving into an institutional offering. Digital Science, the people behind figshare, themselves owned by NPG, clearly believe they can extract value from th e data they will end up with.
  14. Those worried about where we’ll store all this data sometimes point to cloud solutions as a panacea. They do have a useful role to play, and I see we’ll be hearing about some of the success stories later on in this meeting. But David Rosenthal’s analysis shows clearly that the cloud isn’t effective for the long-term storage of even little—used data.
  15. I urge you to read David’s blog and his article in IJDC to get a better understanding of his arguments. For the moment you’ll have to take it on trust. This graph compares costs of storing data for 100 years in either Amazon S3 or local systems, using different values of Kryder’s law which describes the change in unit storage cost over time. S3 loses out by a very large margin for all values, yet also has exit costs that make it almost impossible to get out of it cost effectively once you have opted in.
  16. It’s still true even for Amazon Glacier, the low-cost option supposedly aimed at long-term preservation. The gap is smaller, but still there. Worse, every use of the data dramatically impacts the cost. This graph assumes that the only access is for periodic verification of the data, perhaps only once every 2 years.
  17. Some countries have mounted national efforts to support universities to deal with the issues more effectively. In th e US, this has primarily been through NSF programmes and projects such as DataONE. You’ll be hearing more about these so I’ll say no more myself. In Australia, the parallel initiatives of the ANDS and RDSI provide national infrastructure backed with funded action within universities. In the UK, the DCC performs a similar role to ANDS and Jisc’s MRD programmes fund the univerisity-level action, often with partners such as publishers and international groups such as CODATA or CASRAI. The Netherlands and Canada both have similarly-named national initiatives, although that in the Netherlands is already delivering based on grassroots joint model between a data centre and universities. There is also activiity in Finland, Denmark and Germany – and possibly elsewhere.
  18. In the UK, the DCC provides a mixture of guidance, events, current awareness, online services for tasks like data management planning, and embedded work in universities.
  19. The embedded work contains multiple components which help with everything from initially making the case for action through training support staff and researchers and designing and delivering services to researchers that work with national and international infrastructure.
  20. Funded work in universities, subject to competitive bidding, complements this work. These are examples of training programmes developed for research disciplines and for library staff in effective data management.
  21. The Australian National Data Service has a similar remit, but substantially more funding. It has a clear goal to increase data reuse in Australia and of Australian research and a simple vision of the change it intends to bring about. I’ll say more about some of its services later.
  22. Yet some researchers still aren’t convinced by the rhetoric. Carly Strasser at CDL has listed some of the reasons for not sharing data that she’s encountered – and here are some of my one-line responses. I’m not saying that the concerns aren’t sincere or reasonable but they can all be dealt with and some are positively misguided. The purpose of data centres, for instance, is to make data independently reusable (as stated in the OAIS standard) which relieves researchers of the burden of dealing with questions about it, at the same time as increasing the likelihood that their data will be cited.
  23. It’s unfortunate that I find many of these excuses familiar from the time that I ran services for the UK national archive dealing with government data. They are nearly all the same – although it’s true that politicians rarely argue that they want to get one more paper out of the data before it is released. Whatever, these people aren’t company that you want to be in.
  24. Mu ch as I enjoy the JIR, it isn’t the publication most of us aim for. But it brings home one compelling argument for making data available, that of research integrity. Almost all fraud, and other less clear-cut cases of bad research, can be associated with the unavailability of research data. There are real consequences, including human suffering and death – of which more later. And did I mention that making your data available makes it more likely to be cited? Don’t worry, I will again.
  25. These are just a few examples, some of outright fraud and others of simply dodgy research all of which would have been uncovered far more quickly had the data been made routinely available. The Duke case in particular roused the suspicions of many in the field but took many years to get to the bottom of because data was locked away. It is just one example of a set of practices described very clearly by Ben Goldacre in ‘Bad Pharma’. Missing data is the largest section in his book, although he has other justified concerns with research relating to medial treatments. It has led to a global movement to ensure that all cliniical trial data is made available. But medicine is by no means the only area affected.
  26. Medicine does, however, provide some clear reasons why we can’t just stick all research data on the internet for anyone to trawl through. When human subjects are involved there are real concerns about confidentiality. Yet what alltrials.net and other initiatives make clear is that the *existence* of the data should never be hidden. That allows it to be discovered and for negotiations to take place about its use. It avoids costly replication, which can delay scientific discovery and involve human suffering when the replication takes the form of a clinical trial.
  27. There are other concerns that I have with the way some data centres behaved historically. Some feel like genteleman’s clubs – only available to members and with significant and sometimes abstruse barriers to membership. Many are moving away from this model but there is still some way to go.
  28. Did I mention that making data available increases citations? This is a win all round. If you don’t believe me, here are three studies from three different areas that all show robust, positive correlations. The effect size varies with discipline, but we have enough evidence now that anyone who says that their area is different needs to come up with evidence to show why.
  29. It’s not enough that data is preserved – it must be findable, both from the publication that describes it and as an entity in itself. Not all data goes with a publication. Services at many different levels have a role to play, particularly to ensure that data reuse can happen in a cross-disciplinary way.
  30. ANDS were the first to do something at national level and in a generic way with research data australia. We in the UK are following much of their model. Meanwhile the government funders are merging what were funder-level discovery services for all their research outputs. There are broad multi-disciplines services such as the World Data System and two services at present tackling a related problem – finding an appropriate place to put your data. This is a real problem for many.
  31. Re3data is a DFG-funded project trying to tackle this, building in part on work done by the DCC along with biomedcentral and … The brief descriptions have handy icons expressing things like usage conditions in a compact way, but there’s lots more detail available as these screenshots show. Links to the terms and conditions of use and the standards employed are particularly valuable.
  32. Here in the USA the databib project is undertaking a similar initiative – this is a record for the same archive, the archaeology data service in York, UK. It’s briefer, but tells us much of what we need to know. Of note is the fact that all this data is available as RDF, making it easy for others to build services on top on top of this registry. That’s going to be important. I hope re3data will do something similar.
  33. Services like this make it easy when we want to locate two datasets, perhaps from two sub-disciplines, to combine – a common enough requirement.
  34. But increasingly we want to undertake combinations of hundreds or even thousands of individual datasets and to do so in a relatively automated way. In general, we don’t yet have services that make this straightforward.
  35. There’s more taking place at global level, far more than I have time to discuss here. Many of you will be familiar with the work of the Belmont forum on climate data. There are many other groups working in large domains such as this. There are also more generic intitatives, some of them long-lived such as CODATA and others much newer such as the RDA. Both are working particularly hard to identify generic solutions to many problems that relate to research data management in ways that individual disciplines will find hard to do. Both achieve much more through global coordination than any national or even continental initiative can achieve. One generic issue is that of providing permanent, citable identifiers for data. EZID, from the California Digital Library, is one solution. Datacite is another, backed by national libraries in many countries.
  36. All this work is aimed at making data reuse simpler. We’re all familiar with research lifecycle models such as this one, where ideas lead to projects, data and publications.
  37. We know that these lifecycles can connect, with one group building on the work of another
  38. But research that collects data doesn’t always provide publishable results in a useful timeframe, or at all. Much of this work is aimed at making sure that we can still benefit from the data of others even when parts of the lifecycle are broken. And those others will benefit from the citations we provide to their source data.
  39. There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  40. The message from colleagues at 3TU in Denmark is one I would like to leave you with. See your data as a treasure, but one you only gain value from when it is shared.