SlideShare a Scribd company logo
1 of 56
Inverting the Pyramid: 
Maximising the value of 
research data to society 
Kevin Ashley 
Digital Curation Centre 
www.dcc.ac.uk 
@kevingashley 
Kevin.ashley@ed.ac.uk 
Reusable with attribution: CC-BY 
The DCC is supported by Jisc
My home – the DCC 
• Mission – to 
increase capability 
and capacity for 
research data 
services in UK 
institutions 
• Not just a UK 
problem – an 
international one 
• Training, shared 
services, guidance, 
policy, standards, 
futures 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 2
DCC networks and partnerships 
Original Slide: 
Martin Donnelly, 
DCC 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 3
About me 
• 35 years ago – a mathematician in medical 
research 
• Acquired a skill for rescuing old data: 
– Lost code books 
– Lost programs 
– Bad or obsolete media or systems 
• It was fun – but it should not have been 
necessary 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 4
My home – the DCC 
• Mission – to 
increase capability 
and capacity for 
research data 
services in UK 
institutions 
• Not just a UK 
problem – an 
international one 
• Training, shared 
services, guidance, 
policy, standards, 
futures 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 5
Generic science data lifecycle 
PLAN COLLECT INTEGRATE/ 
TRANSFORM 
PUBLISH DISCOVER ARCHIVE/ 
DISCARD 
Adapted from: Harnessing the Power of Digital Data: Taking the Next Step.‖ 
Scientific Data Management (SDM) for Government Agencies: 
Report from the Workshop to Improve SDM. 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 6
E-Science curation report - 2003 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 7
Herve L’Hour’s analysis 
• Data lifecycles are linear, cyclical or spiral 
(sometimes all three) 
• See more at 
http://www.dcc.ac.uk/events/research-data-management- 
forum-rdmf/rdmf11 - workflows 
& research data management 
• Linear cycles are project-based or repository-based 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 10
Traditional knowledge management 
view of data 
Image © John Curran @ 
designedforlearning.co.uk 
Image from forwardmotion.eu 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 11
But in research… 
"DIKW-diagram" by RobOnKnowledge - Own work. Licensed under 
Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - 
http://commons.wikimedia.org/wiki/File:DIKW-diagram. 
png#mediaviewer/File:DIKW-diagram.png 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 12
I ♥ your data! 
I don’t ♥ what you said 
about it.
LIDAR & RADAR images of ice cloud – 
H. Ruschennberg 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
14
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
15 
The Old 
weather 
project 
Data for 
research, 
not from 
research
Data reuse stories 
• The palaeontologist who saved years of work 
with archaeological data 
• The 19th-century ships logs that help us model 
climate change 
• The ‘noise’ from research radar that mapped 
dust from Eyjafjallajökull 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 16
Data reuse - messages 
Often your data tells 
stories that your 
publications do not 
Not all data comes from 
other researchers 
Discipline-bounded data 
discovery doesn’t give us 
all we need or want 
One person’s noise is 
another person’s signal 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 17
Understanding Biodiversity 
• We don’t understand what drives it 
• What helps, hinders speciation 
• No one project or data source is enough 
• Biology, geology, climate science, chemistry… 
• Big and small problems 
• Reanalysis & gap analysis
Research on Biodiversity… 
• Requires many different data sources 
• Not all will be published 
• Not all publications are for similar research 
reasons, so… 
• Citing the publication is irrelevant 
• Some is research data, other government or 
reference data
Why care? 
• Data is expensive – an investment 
• Reuse: 
– More research 
– Teaching & Learning 
– Planning 
• Impact – with or without publication 
• Accountability 
• Legal & regulatory requirements 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 20
Why does this matter? 
• Research quality 
– How close can we get to 
the truth? 
• Research speed 
– How quickly can we get 
to the truth? 
• Research finance 
– How much does the 
truth cost? 
• Improving one or more 
of these is of interest to 
all actors: 
• Researchers as data 
creators 
• Researchers as data 
reusers 
• Research institutions 
• Funders – hence 
government and society 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
21
Creative data reuse 
• http://vimeo.com/38402965 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 22
Integrity – not without data 
• Cyril Burt 
– Twin studies on intelligence. 
– Questioned 1976; now discredited 
• Duke case 
– Data hiding leads to wasted treatments, clinical 
trials, probable death & huge lawsuits 
• Dutch cases 
– Stapel – 55 publications – “fictitious data” 
– Poldermans – fabricated data or negligence? 
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials 
“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256 
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23
Without data reuse: 
•We can waste billions 
• People suffer & die 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 24
Data reuse from Hubble 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 25
Data reuse is already 
happening – and 
researchers can change 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 26
Where can it happen 
Global, international 
Nationally 
By Subject Institution 
Research Group
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
28
Research data centres are good value! 
• See Jisc reports on ADS, BADC, UKDA: 
• Returns on investment between 400% and 
1200% 
• Unfortunately – many research domains have 
no relevant data centres 
http://www.jisc.ac.uk/whatwedo/programmes/di_dir 
ections/strategicdirections/badc.aspx 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 29
“Provision for data management, for 
curation and long-term preservation, and 
for the sharing and re-use of data, varies 
wildly between subject areas.” 
“The data management needs of many 
researchers are little considered or catered for.” 
If greater provision is to be 
made, a shortfall in 
infrastructure (both technical 
and human) must be 
overcome. 
Policy makers are aware that 
in many areas of enquiry, 
researchers’ access to well-managed, 
open and reusable 
data opens up significant 
opportunities. 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
30 
All from JISC MRD 2 
call, 2010
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
31
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
32
The library as custodian 
• Increasing role for library to provide access to 
institutional assets 
• See Lorcan Dempsey’s thoughts on the inside-out 
library vs outside-in library 
– http://www.slideshare.net/lisld/the-inside-out-library 
• Build on library strengths – preservation, 
access, curation, selection 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 33
G8UK - Endorses 
OA 
Open Data 
Charter 
Policy Paper 
18 June 2013 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
34
Funder requirements 
http://www.epsrc.ac.uk/abo 
ut/standards/researchdata/P 
ages/policyframework.aspx 
UK - RCUK 
Canada 
UK - RCUK 
USA – NSF, 
Denmark NEH, etc 
USA – non-government 
funders (Sloan, 
Gates,…) 
Europe 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 35
RCUK policy - The 1-minute version 
• Research data are a public good – make openly 
available in timely & responsible way 
• Have policies & plans. Data with long-term value 
should be preserved & usable 
• Metadata for discovery & reuse. Link publications & 
data 
• Sometimes law, ethics get in the way. We understand. 
• Limited embargos OK. Recognition is important – 
always cite data sources 
• OK to use public money to do this. Do it efficiently. 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 36
EPSRC policy points 
• Awareness of regulatory environment 
• Data access statement 
• Policies and processes 
Compliance 
• Data storage 
expected by 2015 
• Structured metadata descriptions 
• DOIs for data 
• Securely preserved for a minimum of 10 years 
from last use 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
DCC Policy 
Summary 
38 
http://www.dcc.ac.uk/resources/policy-and-legal
Helping make data reuse possible – 
experience from the DCC 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 39
Some lessons – a summary 
• Data reuse is rarely as simple as people think it is 
• It is already happening 
• It is good for research, for researchers, for funders, for 
universities 
• Without senior management attention and researcher 
involvement, your initiative will fail 
• Research data management services cannot involve the 
library alone 
• Researchers need to know your services exist 
• Training for young researchers in good data practice is 
valuable 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 40
DCC ‘institutional engagement’ 
Assess 
needs 
Make the case 
Develop 
support and 
services 
RDM policy 
development 
Customised Data 
Management Plans 
DAF & CARDIO 
assessments 
Guidance and 
training 
Workflow 
assessment 
DCC 
support 
team 
Advocacy with senior 
management 
Institutional 
data catalogues 
Pilot RDM 
tools 
Original Slide: 
Graham Pryor, 
DCC 
…and support policy implementation 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
41
Some institutional roles 
• Leadership – coordinate action 
• Audit – who has what, where does it go? 
• Advice on access – data, wherever it is 
• Preservation – permanence 
• Citability 
• Data/publication linking 
• Promoting data in teaching 
• Selection 
• Education – early career researchers 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 42
Who (in the UK) is leading RDM work? 
RESEARCHERS 
Library 
IT 
Research 
Office 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 43
INSTITUTIONAL SERVICES 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 44
Some example services 
• Storage – persistent, shareable 
• Permanent, citeable identifiers 
• Database as a service (e.g. Oxford ORDS) 
• Embed tools in Excel – Dataup, others 
• Workflow management – Taverna 
• Training for early career researchers 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 45
Make data creation easier 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 46
Make data citable 
• Making data available increases citations 
• Everyone – academic, funder, institution – 
loves citations 
• Want evidence? 
– Alter, Pienta, Lyle – 240%, social sciences * 
– Piwowar, Vision – 9% (microarray data)† 
– Henneken, Accomazzi – 20% (astronomy) # 
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618 
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. 
http://hdl.handle.net/2027.42/78307 
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 
http://dx.doi.org/10.7287/peerj.preprints.1v1 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 47
Make data discoverable 
• Data must be discoverable to be reused 
• Alone, or in conjunction with publication 
• Services include: 
– Institutional catalogues 
– national data registries 
– Repository registries – databib, re3data 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 48
Dataverse – 
helping 
researchers 
make data 
findable & 
reusable 
Gking.harvard.edu/data 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 49
DCC guidance 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 50
http://dataintelligence.3tu.nl/en/home/ 
Choice of RDM training 
materials for librarians 
Up-skilling 
for data 
http://datalib.edina.ac.uk/mantra/libtraining.html 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
51
What data to keep 
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52
The Data Deluge is upon us 
Sensor’s ability 
to produce data 
outstrips IT’s 
ability to 
process it 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 53
Roles and 
Responsibilities 
What data to keep 
2014-11-25 
Kevin Ashley –IMCW/ICKM-2014, Antalya - 
CC-BY 
54
IDCC15 – London, Feb 9-12 2015 
The 10th 
International 
Digital 
Curation 
Conference 
http://www.dcc.ac.uk/events/idcc15 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 55
My message to researchers 
• The credit belongs to you 
• The data belongs to all of us 
• Share, and we all reap the 
benefits 
• The story doesn’t end with a 
publication 
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 56

More Related Content

Viewers also liked

Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Kevin Ashley
 
Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)Kevin Ashley
 
What can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowWhat can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowKevin Ashley
 
Missing links closing talk - with notes
Missing links closing talk - with notesMissing links closing talk - with notes
Missing links closing talk - with notesKevin Ashley
 
Research data for repository managers
Research data for repository managers Research data for repository managers
Research data for repository managers Kevin Ashley
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009Kevin Ashley
 
National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)Kevin Ashley
 

Viewers also liked (7)

Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...
 
Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)Research data: burden or treasure? (Talk from #fote13)
Research data: burden or treasure? (Talk from #fote13)
 
What can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowWhat can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield Roadshow
 
Missing links closing talk - with notes
Missing links closing talk - with notesMissing links closing talk - with notes
Missing links closing talk - with notes
 
Research data for repository managers
Research data for repository managers Research data for repository managers
Research data for repository managers
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009
 
National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)
 

Similar to Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Use and reuse: research data locally & globally #esipfed
Use and reuse: research data locally & globally #esipfedUse and reuse: research data locally & globally #esipfed
Use and reuse: research data locally & globally #esipfedKevin Ashley
 
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Jisc
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria ResearchKevin Ashley
 
Introduction to information literacy part 1
Introduction to information literacy part 1Introduction to information literacy part 1
Introduction to information literacy part 1mhayes2006
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the libraryColleen DeLory
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the libraryLibrary_Connect
 
Open Access & sharing research data: a Dutch workshop for phd in economics
Open Access & sharing research data: a Dutch workshop for phd in economicsOpen Access & sharing research data: a Dutch workshop for phd in economics
Open Access & sharing research data: a Dutch workshop for phd in economicsEsther Hoorn
 
Trust: when we need it and how to get it
Trust: when we need it and how to get itTrust: when we need it and how to get it
Trust: when we need it and how to get itKevin Ashley
 
AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016University of Edinburgh
 
Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...Robin Rice
 
Incentives for modern research
Incentives for modern researchIncentives for modern research
Incentives for modern researchJisc
 
Shared research data management service
Shared research data management serviceShared research data management service
Shared research data management serviceJisc RDM
 
Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Michael Day
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...John Scally
 
Keeping the Momentum: Moving Ahead with Research Data Support
Keeping the Momentum: Moving Ahead with Research Data SupportKeeping the Momentum: Moving Ahead with Research Data Support
Keeping the Momentum: Moving Ahead with Research Data SupportHilary Davis
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesLouise Corti
 

Similar to Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote) (20)

Open data for open scholarship - where we are
Open data for open scholarship - where we areOpen data for open scholarship - where we are
Open data for open scholarship - where we are
 
Use and reuse: research data locally & globally #esipfed
Use and reuse: research data locally & globally #esipfedUse and reuse: research data locally & globally #esipfed
Use and reuse: research data locally & globally #esipfed
 
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria Research
 
An embedded repository: the enlighten experience at the University of Glasgow
An embedded repository: the enlighten experience at the University of GlasgowAn embedded repository: the enlighten experience at the University of Glasgow
An embedded repository: the enlighten experience at the University of Glasgow
 
Introduction to information literacy part 1
Introduction to information literacy part 1Introduction to information literacy part 1
Introduction to information literacy part 1
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Open Access & sharing research data: a Dutch workshop for phd in economics
Open Access & sharing research data: a Dutch workshop for phd in economicsOpen Access & sharing research data: a Dutch workshop for phd in economics
Open Access & sharing research data: a Dutch workshop for phd in economics
 
Trust: when we need it and how to get it
Trust: when we need it and how to get itTrust: when we need it and how to get it
Trust: when we need it and how to get it
 
AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016
 
Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...
 
RDM @ UoE
RDM @ UoERDM @ UoE
RDM @ UoE
 
Incentives for modern research
Incentives for modern researchIncentives for modern research
Incentives for modern research
 
Shared research data management service
Shared research data management serviceShared research data management service
Shared research data management service
 
Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
 
Keeping the Momentum: Moving Ahead with Research Data Support
Keeping the Momentum: Moving Ahead with Research Data SupportKeeping the Momentum: Moving Ahead with Research Data Support
Keeping the Momentum: Moving Ahead with Research Data Support
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciences
 
Research Data Management at The University of Edinburgh
Research Data Management at The University of EdinburghResearch Data Management at The University of Edinburgh
Research Data Management at The University of Edinburgh
 

More from Kevin Ashley

RISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkRISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkKevin Ashley
 
An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...Kevin Ashley
 
Data Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewData Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewKevin Ashley
 
Data and the webmanager
Data and the webmanagerData and the webmanager
Data and the webmanagerKevin Ashley
 
Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Kevin Ashley
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challengesKevin Ashley
 
ipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programmeipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training ProgrammeKevin Ashley
 

More from Kevin Ashley (7)

RISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkRISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation Framework
 
An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...
 
Data Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewData Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal view
 
Data and the webmanager
Data and the webmanagerData and the webmanager
Data and the webmanager
 
Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challenges
 
ipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programmeipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programme
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

  • 1. Inverting the Pyramid: Maximising the value of research data to society Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc
  • 2. My home – the DCC • Mission – to increase capability and capacity for research data services in UK institutions • Not just a UK problem – an international one • Training, shared services, guidance, policy, standards, futures 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 2
  • 3. DCC networks and partnerships Original Slide: Martin Donnelly, DCC 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 3
  • 4. About me • 35 years ago – a mathematician in medical research • Acquired a skill for rescuing old data: – Lost code books – Lost programs – Bad or obsolete media or systems • It was fun – but it should not have been necessary 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 4
  • 5. My home – the DCC • Mission – to increase capability and capacity for research data services in UK institutions • Not just a UK problem – an international one • Training, shared services, guidance, policy, standards, futures 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 5
  • 6. Generic science data lifecycle PLAN COLLECT INTEGRATE/ TRANSFORM PUBLISH DISCOVER ARCHIVE/ DISCARD Adapted from: Harnessing the Power of Digital Data: Taking the Next Step.‖ Scientific Data Management (SDM) for Government Agencies: Report from the Workshop to Improve SDM. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 6
  • 7. E-Science curation report - 2003 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 7
  • 8.
  • 9.
  • 10. Herve L’Hour’s analysis • Data lifecycles are linear, cyclical or spiral (sometimes all three) • See more at http://www.dcc.ac.uk/events/research-data-management- forum-rdmf/rdmf11 - workflows & research data management • Linear cycles are project-based or repository-based 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 10
  • 11. Traditional knowledge management view of data Image © John Curran @ designedforlearning.co.uk Image from forwardmotion.eu 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 11
  • 12. But in research… "DIKW-diagram" by RobOnKnowledge - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:DIKW-diagram. png#mediaviewer/File:DIKW-diagram.png 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 12
  • 13. I ♥ your data! I don’t ♥ what you said about it.
  • 14. LIDAR & RADAR images of ice cloud – H. Ruschennberg 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 14
  • 15. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 15 The Old weather project Data for research, not from research
  • 16. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change • The ‘noise’ from research radar that mapped dust from Eyjafjallajökull 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 16
  • 17. Data reuse - messages Often your data tells stories that your publications do not Not all data comes from other researchers Discipline-bounded data discovery doesn’t give us all we need or want One person’s noise is another person’s signal 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 17
  • 18. Understanding Biodiversity • We don’t understand what drives it • What helps, hinders speciation • No one project or data source is enough • Biology, geology, climate science, chemistry… • Big and small problems • Reanalysis & gap analysis
  • 19. Research on Biodiversity… • Requires many different data sources • Not all will be published • Not all publications are for similar research reasons, so… • Citing the publication is irrelevant • Some is research data, other government or reference data
  • 20. Why care? • Data is expensive – an investment • Reuse: – More research – Teaching & Learning – Planning • Impact – with or without publication • Accountability • Legal & regulatory requirements 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 20
  • 21. Why does this matter? • Research quality – How close can we get to the truth? • Research speed – How quickly can we get to the truth? • Research finance – How much does the truth cost? • Improving one or more of these is of interest to all actors: • Researchers as data creators • Researchers as data reusers • Research institutions • Funders – hence government and society 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 21
  • 22. Creative data reuse • http://vimeo.com/38402965 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 22
  • 23. Integrity – not without data • Cyril Burt – Twin studies on intelligence. – Questioned 1976; now discredited • Duke case – Data hiding leads to wasted treatments, clinical trials, probable death & huge lawsuits • Dutch cases – Stapel – 55 publications – “fictitious data” – Poldermans – fabricated data or negligence? “The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials “Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256 2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23
  • 24. Without data reuse: •We can waste billions • People suffer & die 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 24
  • 25. Data reuse from Hubble 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 25
  • 26. Data reuse is already happening – and researchers can change 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 26
  • 27. Where can it happen Global, international Nationally By Subject Institution Research Group
  • 28. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 28
  • 29. Research data centres are good value! • See Jisc reports on ADS, BADC, UKDA: • Returns on investment between 400% and 1200% • Unfortunately – many research domains have no relevant data centres http://www.jisc.ac.uk/whatwedo/programmes/di_dir ections/strategicdirections/badc.aspx 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 29
  • 30. “Provision for data management, for curation and long-term preservation, and for the sharing and re-use of data, varies wildly between subject areas.” “The data management needs of many researchers are little considered or catered for.” If greater provision is to be made, a shortfall in infrastructure (both technical and human) must be overcome. Policy makers are aware that in many areas of enquiry, researchers’ access to well-managed, open and reusable data opens up significant opportunities. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 30 All from JISC MRD 2 call, 2010
  • 31. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 31
  • 32. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 32
  • 33. The library as custodian • Increasing role for library to provide access to institutional assets • See Lorcan Dempsey’s thoughts on the inside-out library vs outside-in library – http://www.slideshare.net/lisld/the-inside-out-library • Build on library strengths – preservation, access, curation, selection 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 33
  • 34. G8UK - Endorses OA Open Data Charter Policy Paper 18 June 2013 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 34
  • 35. Funder requirements http://www.epsrc.ac.uk/abo ut/standards/researchdata/P ages/policyframework.aspx UK - RCUK Canada UK - RCUK USA – NSF, Denmark NEH, etc USA – non-government funders (Sloan, Gates,…) Europe 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 35
  • 36. RCUK policy - The 1-minute version • Research data are a public good – make openly available in timely & responsible way • Have policies & plans. Data with long-term value should be preserved & usable • Metadata for discovery & reuse. Link publications & data • Sometimes law, ethics get in the way. We understand. • Limited embargos OK. Recognition is important – always cite data sources • OK to use public money to do this. Do it efficiently. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 36
  • 37. EPSRC policy points • Awareness of regulatory environment • Data access statement • Policies and processes Compliance • Data storage expected by 2015 • Structured metadata descriptions • DOIs for data • Securely preserved for a minimum of 10 years from last use 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY
  • 38. 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY DCC Policy Summary 38 http://www.dcc.ac.uk/resources/policy-and-legal
  • 39. Helping make data reuse possible – experience from the DCC 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 39
  • 40. Some lessons – a summary • Data reuse is rarely as simple as people think it is • It is already happening • It is good for research, for researchers, for funders, for universities • Without senior management attention and researcher involvement, your initiative will fail • Research data management services cannot involve the library alone • Researchers need to know your services exist • Training for young researchers in good data practice is valuable 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 40
  • 41. DCC ‘institutional engagement’ Assess needs Make the case Develop support and services RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team Advocacy with senior management Institutional data catalogues Pilot RDM tools Original Slide: Graham Pryor, DCC …and support policy implementation 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 41
  • 42. Some institutional roles • Leadership – coordinate action • Audit – who has what, where does it go? • Advice on access – data, wherever it is • Preservation – permanence • Citability • Data/publication linking • Promoting data in teaching • Selection • Education – early career researchers 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 42
  • 43. Who (in the UK) is leading RDM work? RESEARCHERS Library IT Research Office 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 43
  • 44. INSTITUTIONAL SERVICES 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 44
  • 45. Some example services • Storage – persistent, shareable • Permanent, citeable identifiers • Database as a service (e.g. Oxford ORDS) • Embed tools in Excel – Dataup, others • Workflow management – Taverna • Training for early career researchers 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 45
  • 46. Make data creation easier 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 46
  • 47. Make data citable • Making data available increases citations • Everyone – academic, funder, institution – loves citations • Want evidence? – Alter, Pienta, Lyle – 240%, social sciences * – Piwowar, Vision – 9% (microarray data)† – Henneken, Accomazzi – 20% (astronomy) # # Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618 * Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 † Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 47
  • 48. Make data discoverable • Data must be discoverable to be reused • Alone, or in conjunction with publication • Services include: – Institutional catalogues – national data registries – Repository registries – databib, re3data 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 48
  • 49. Dataverse – helping researchers make data findable & reusable Gking.harvard.edu/data 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 49
  • 50. DCC guidance 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 50
  • 51. http://dataintelligence.3tu.nl/en/home/ Choice of RDM training materials for librarians Up-skilling for data http://datalib.edina.ac.uk/mantra/libtraining.html 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 51
  • 52. What data to keep 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52
  • 53. The Data Deluge is upon us Sensor’s ability to produce data outstrips IT’s ability to process it 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 53
  • 54. Roles and Responsibilities What data to keep 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 54
  • 55. IDCC15 – London, Feb 9-12 2015 The 10th International Digital Curation Conference http://www.dcc.ac.uk/events/idcc15 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 55
  • 56. My message to researchers • The credit belongs to you • The data belongs to all of us • Share, and we all reap the benefits • The story doesn’t end with a publication 2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 56

Editor's Notes

  1. I’m Kevin Ashley; I run an organisation called the Digital Curation Centre (DCC) in the UK, and I’ve been invited here today to talk about research data management.
  2. My home – the DCC – is a national service whose role is to increase the capability and capacity for UK research institutions – mainly universities – to run their own research data services. Where it makes sense, we also run some national services which those universities use. Because this is not just an national problem, we work alongside many partners and colleagues around the world.
  3. This collection of logos illustrates many, though not all, of the partnerships we have been or are involved with.
  4. But there’s something about me you should know as well. 35 years ago I began my first job, a mathematician supporting clinical researchers in a large research institution. I acquired many skills there, learning from older researchers and other staff who were happy to pass their knowledge on to me. In particular, I got a reputation as someone who was good at rescuing old data that might otherwise be lost. Lost because coding systems had been forgotten, because the programs required had been lost or no longer worked, or because the media or systems involved were now obsolete. It was great fun – technical detective work. But even then I knew something was wrong. Much of this data was irreplaceable; some involved experiments on human subjects that had involved considerable suffering. Data like that should be capable of being used more than once.
  5. So, via many other jobs involving HPC, networks and digital archives, I ended up running the DCC. We provide training, shared services, guidance, policy, we develop standards and we look at future possible directions in this area. But the strapline – the phrase at the heading of our web pages – bears closer examination. “Because good research needs good data” is behind all of what we do and much of what I’ll say today. The data we use isn’t always ours, but it always needs to be good.
  6. There are many possible views of how research takes place. This generic one is typical – a plan leads to data collection, some type of transformation, publication, discovery by others of your results, and a later decision to keep or discard the data. That last step rarely appears – it is only in this diagram because it was written by people whose main concern is taking care of scientific data.
  7. Some views are more complex. This, from an e-science curation report in 2003, is not quite as linear, but it still involves the same basic processes. It is notable from its description of the eventual consumers – not just other researchers, but also industry and the public.
  8. And some are just too complicated to be of any use at all.
  9. This is from my own organisation. You will notice that it is a loop – where data, once produced and retained, can go on to inform other research.
  10. Herve L’Hour analysed many research lifecycles and gave a presentation earlier this year about them – I encourage you to read it to learn more. He found that lifecycles are linear, cyclical (like the DCC’s) or spiral, or sometimes a hybrid of all three. Linear cycles tend to be produced from this with a project view or a repository view – where the repository is the final step in acquisition of data from a researcher. Yet it isn’t the final step – we’ll see later why not.
  11. We have similar linear views in the world of knowledge management. The pyramids have different levels, but they all view data as a large, raw, underlying substrate which is successively refined to produce insight of some form. Again, these diagrams encourage the view that data is used once to produce a few shining nuggets of wisdom. Yet knowledge managers know better. Your skills are directed at getting many insights out of large data collections – but diagrams such as this don’t reflect that. I worry because these diagrams can affect our thinking about problems and strategies.
  12. In research we don’t want our insight to be an end point. We hope that it leads others to new investigations – if nothing else it will give us citations! But it isn’t always our publications, where insights are recorded, that is the target of reuse. Sometimes it is our data.
  13. You see, sometimes our knowledge is irrelevant. What someone else wants is our data, and information about how we collected it. They will learn something completely new from it. What we knew does not help them.
  14. Here are some examples of data reuse. And the last comes from researchers at TU Delft in the Netherlands. They use a technology called LIDAR to measure raindrops and ice crystals in clouds. They are interested in knowing more about how rain, snow and hail forms. Their detectors produce many gigabytes of data an hour and most of it is not of interest to them. They filter the data to discard the noise and retain what they see as signal. But in early 2010 a volcano in Icelland caused severe disruption to air travel in Europe. One problem was that no one could be really sure where the dust from the volcano had travelled to. These researchers realised that the data they were discarding might be able to give answers – and they were right. The raw data was rescued just before it was due to be deleted, 2 weeks after collection.
  15. The old weather project is using crowdsourcing to capture data from digitised ship’s logs from the 19th century and earlier. Because the logs contain precise recordings of temperature, wind speed and so on as well as accurate location information, this provides us with a comprehensive picture of weather around the world in the 19th century, information which we can use to calibrate today’s climate models. Before this, the logs were used by economic historians studying trade patterns. And finally, the crowd-sourcers are also capturing data about the names of people who left and joined the ships, data that is of interest to family historians. That is 3 separate uses for this original material, which wasn’t even created for research purposes.
  16. And there are many other tales, including the paleontologist – someone looking at dinosaur bones – who found what he needed to know in an archaeological data archive.
  17. So there are four lessons I would draw from these examples. Our data can tell stories that our publications do not –think of the archaeological data, which also tells us that discipline-bounded data discovery – chemical data for chemists, or even neurochemical data for neurochemists – does not always meet our needs. The Old Weather project reminds us that not all data used by researchers was produced for researchers, and the final example tells us that signal and noise are just two views of the same thing.
  18. There are bigger questions that data reuse can answer. Biodiversity is a big area of research. Although we believe it is a good thing, we don’t understand enough about what causes it. Simplistic views – such as the idea that hot, wet environments promote biodiversity – are known not to be universally true. Trying to understand biodiversity can’t be done by the data from one project or one discipline. We need data from many sources to investigate potential theories and data from projects that are large and small, from examinations of species in a single pond to studies on a global scale. When we integrate all this data, we may find gaps – reasons to collect data to fill them.
  19. Not all of that data will have led to publications. Even where it did, the publication may have little to do with our interest n biodiversity. We need to be able to cite this data as data, not cite the (possibly non-existent) publication it originally produced. Not all of that data will come from other researchers – some will be government data, some reference data, some from a commercial environment such as agriculture and forestry.
  20. Kevin Ashley, DCC, UKSG Glasgow. CC-BY
  21. For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers. Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others. Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money. And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.
  22. For an example of creative data re-use in a teaching context, see the work of globe4D. A simple device allows us to visualise data about the earth, asking what-if questions about changing sea-levels and temperatures. But we can move time back and forth as well, looking at the continents as they were 200m years ago and asking the same what-if questions then. When we’re bored with the Earth, we can do the same things with Mars or Venus – what would Mars look like with oceans of the same volume as Earth? Again, this requires integrating data from many open sources with some simple technology (and some very good visualisation.) It creates a tool which allows us to ask deep questions easily and quickly see the answers; from my own experience, it is capable of turning a group of adults into children with ease. This is a good thing – we rediscover curiousity and enthusiasm. But it’s also a great teaching tool for children, if the adults get out of the way for long enough!
  23. These are just a few examples, some of outright fraud and others of simply dodgy research all of which would have been uncovered far more quickly had the data been made routinely available. The Duke case in particular roused the suspicions of many in the field but took many years to get to the bottom of because data was locked away. It is just one example of a set of practices described very clearly by Ben Goldacre in ‘Bad Pharma’. Missing data is the largest section in his book, although he has other justified concerns with research relating to medical treatments. It has led to a global movement to ensure that all clinical trial data is made available. But medicine is by no means the only area affected.
  24. So there’s one powerful argument for exposing the existence of data and enabling re-use – without doing so, we waste billions on ineffective treatments, and people suffer and die. I could just stop here, but I won’t. Other arguments are available.
  25. Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data.
  26. Which leads to our second lesson. Some people say data sharing and reuse is a difficult change for researchers. In some disciplines, it is. But many have been doing it for some time, and those that have changed have benefited as a result.
  27. Research happens at many different scales – internationally, nationally, in small groups and many scales in between. Taking care of data, curating it, needs to reflect all these scales. We have existing examples at every one of them.
  28. 2014-11-25
  29. We know that these data centres are good value – this study by Jisc shows that the return on investment they generate is between 400% and 1200% - rates of return that would make them very valuable in the commercial world. But the benefit generated is for society as a whole, not a set of shareholders. And worse, many areas of research don’t have data centres to cater for them.
  30. So we have a position where some of the infrastructure exists to enable data sharing, but not all. It is good in some domains of research and not others; good in some countries, some universities, and not others. This was part of the motivation for Jisc’s Managing Research Data programme in 2010 onwards – some selective quotes from the call are here. We see recognition from policy makers of the value of data reuse; that provision is unequal across subject areas; that many researchers are poorly catered for; and that infrastructure needs to be created. That infrastructure is not just technical. The human element – training, skills, changing attitudes – is equally important.
  31. … and that means that, whether you think a library should look like this….
  32. … or like this….
  33. …. That there is a role for the library to play, in providing access to and caring for institutional assets of all types, including research data. This fits with Lorcan Dempsey’s view of the inside-out library – a shift from a library whose role is to acquire material from outside for the benefit of those inside, to one that showcases what is produced inside to achieve impact outside. Providing services around research data builds on the traditional strengths of libraries and librarians – preservation, access, curation, selection, as well as good researcher relationships.
  34. Governments around the world recognise this, along with the value of public data. This statement suddenly made RDM something that government ministers cared about – not something I thought I would see in my lifetime.
  35. But this is happening in many other countries. The USA was another early adopter, and in Europe Horizon2020 has increasingly strict rules about data sharing. Denmark, Canada and others are also acting. Most policies place the burden of compliance on the researcher – some on the organisation where the research takes place. Typically we are seeing policies about open access to data come a few years after policies about open access to publications, though the gap is narrowing. And large non-government funders are also beginning to act, with the Wellcome Trust leading the way in medical research.
  36. RCUK is an umbrella body for government research funders in the UK. It has a set of general principles, summarised here, about data from research that it funds. They are not onerous, and read like common sense.
  37. EPSRC, which funds engineering and physical sciences, interprets these requirements and chooses to make requirements of the university, not the individual researcher. It sees the duty of the university as being to assist the researcher to share data – by being able to preserve it securely, to expose metadata about it, to provide permanent identifiers for data, and so on.
  38. I give these examples as illustrations of how funders approach this, as it affects how universities and researchers respond. If you are interested in more details about policy, the DCC has a series of web pages describing research data policies from around the world.
  39. Enough about the case for data reuse. Enough in the UK were convinced that the DCC received additional funding to work with universities to accelerate the development of research data services. I’ll summarise some lessons that we learnt from this experience.
  40. They are summarised here. Some I’ve covered already; some are worth noting before you begin to do something in your own university. Senior management commitment, from more than one individual, is necessary to sustain change. So is the involvement of your researchers from the outset. Don’t develop policy or design services without them. Although the library is a key player in research data curation, it cannot act alone – other services providers within the university must be involved. Whatever you do develop, awareness-raising is key and needs to be repeated regularly. Researchers need to know what is available to them to make use of it.
  41. The DCC is now 10 years old, but the lessons I will speak of are informed by work we began in 2011 to put much of our guidance into practice. We worked closely initially with 20 universities, to help them establish research data management (RDM) services. We transferred what we learnt from doing that to other universities in order to build up capacity nationally in the UK. Naturally, we learnt from others as well and we hope to pass on that knowledge outside the UK also. We called this work ‘institutional engagement’. We behaved much like consultants, and the work we did depended on what was most needed in a particular organisation. This diagram, produced by my past colleague Graham Pryor, illustrates the range of activities involved, beginning with advocacy – helping to make the case for doing work – to establishing particular services and developing policy.
  42. Different universities will choose to organise RDM services in different ways. But there are some common roles which can be identified. It is useful to look at this list and decide which of these roles could be taken by your library. This list is not exhaustive and I won’t have time to cover all these roles today. I mention here two tools which help with two of these activities. CARDIO helps with needs assessment, and deciding what actions to take next to establish RDM services. Where re your existing strengths & weaknesses? CARDIO helps to answer these questions, and helps you assess progress in future years. DAF helps to answer the simple questions “What have we got already? Where is it? Who is responsible for it?”
  43. But we can begin with leadership. This diagram shows who is taking the lead role in defining RDM services within UK universities. The library is leading in most cases and is involved regardless of who’s championing the cause. Research offices are often the lead partner – seemingly for strategic reasons of senior buy-in and financial commitment. IT are only leading in 2 out of the 20 cases and are disengaged / absent in a few others. Researchers are always involved, but are never the lead.
  44. So what services can be provided?
  45. These are some examples – I will speak about a few of them.
  46. Dataup is plugin for Microsoft Excel, developed by the California Digital Library in collaboration with Microsoft. Many people say serious researchers should not use Excel for data analysis. Others recognise that researchers will do it whatever we say, so the best thing to do is to make Excel a better tool.
  47. Making data citable is a simple service that brings great benefits. Here are links to three studies that show very positive effects that arise from making data available, citable and connected to papers. You can use a repository to provide identifiers such as Handles or work with an organisation like Datacite who can help you provide DOIs.
  48. But as we saw at the beginning of my talk, the data must be discoverable – we cannot assume that people will find our data via publications. An institutional repository can help; experience in Australia shows that a national service which aggregates metadata about datasets can have a much greater impact. Hence we are copying their approach in the UK. We are aware of similar initiatives in a few other countries.
  49. Harvard, with NSF funding, developed a service called Dataverse, which is now the basis of a national service in the Netherlands. It makes it easy for researchers to upload an d describe their data and to update the description over time. This example has one interesting feature which shows how researcher behaviour can change – this page shows that the data was made available before the associated paper was published. The author thus got reaction and publicity for their work in advance of publication.
  50. We used our guidance documents extensively in this work, and produced case studies and new guidance as a result. It includes material you might find useful, such as training materials on RDM for librarians. I have small number of copies of our documents with me today, but they are all freely available from our website, or you can purchase print copies. You can also adapt them, translate them – all use Creative Commons Attribution licences.
  51. One service area I mentioned is training for early career researchers – here are some examples of freely-available training materials and online courses, accompanied by a course aimed at librarians. Why not work with others to adapt and improve these training materials for use in Spain?
  52. Some guidance is aimed directly at researchers – here are two examples on data citation and writing data management plans. We have also produced a freely-available tool for writing such plans (DMPonline) but the guide doesn’t assume that you are using it. Other tools exist – DMPTool from a USA-based consortium is the best known.
  53. Getting better at managing research data isn’t just about keeping more stuff for longer. It’s also about being more selective about what we do keep and documenting the decision-making process that we use. Reports such as this make clear that technological advances means that the cost of producing data is dropping more rapidly than the cost of retention. Some arguments show that if we attempt to retain everything it won’t be long before we’re spending the entire GDP purely on data storage. That’s an extreme analysis, but the problem is real as CERN know well. In some disciplines it really is wiser to just generate the data again when it is needed. But for many observational disciplines, that opportunity isn’t open to us.
  54. This guide – now accompanied by a simple checklist – is particularly relevant. It is about what archivists call appraisal and what librarians often call selection.
  55. I’ll pause for a brief advert for our conference next year. If you want the chance to take part in far more in-depth discussions about these issues, do register to attend.
  56. And end with the message I give to researchers about their data – accompanied by a similar message from the 3TU Datacentrum in the Netherlands, a collaboration between 3 universities. The credit for your data belongs to you, the researcher - but the data belongs to all of us and should be shared.