Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
My Data, Our Data, Your Data:
data reuse through data management
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevin...
A summary
• Why data reuse ?
• What stops us ?
• How data management helps
• Harmonising the goals of research
administrat...
My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 4
What is data curation ?
• “Maintaining, preserving and adding value to
re...
DCC guidance
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 5
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 6
SWEDEN
DENMARK
CANADA
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
2014-05-14 Kevin Ashley – Eurocr...
What a paleontologist looks at
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 8
Now
100 million
years ago
25m
50m 75m
1m
What a paleontologist looks at
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 9
Now
100 million
years ago
25m
50m 75m
1mNo...
What an archaeologist looks at
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 10
Now 1 million
years
750,000500,000100,000...
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs th...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 12
The Old
weather
project
Data for
research,
not from
research
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 13
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs th...
Data reuse - messages
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 15
Often your data tells
stories that your
publicatio...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 16
Why care?
• Data is expensive – an investment
• Reuse:
– More research
–...
Why does this matter?
• Research quality
– How close can we get to
the truth?
• Research speed
– How quickly can we get
to...
G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 18
G8UK - Bi...
Funder requirements
• UK
• USA – NSF, NEH, NIH
• Europe
• Most place burden on
researcher – some on
the institution
2014-0...
RCUK policy - The 1-minute version
• Research data are a public good – make openly
available in timely & responsible way
•...
EPSRC policy points
• Awareness of regulatory environment
• Data access statement
• Policies and processes
• Data storage
...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 22
DCC Policy
Summary
http://www.dcc.ac.uk/resources/policy-and-legal
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 23
Findable, citable data has value
• Important to link publications to dat...
What stops data reuse
• Loss
• Destruction
• Pride
• Gluttony
• Ineptitude
• Concealment
• Bureaucracy
• Complexity
• Proc...
Kevin Ashley – Eurocris2014 -
CC-BY
25
“Departments don’t have guidelines or norms for
personal back-up and researcher pro...
What stops data reuse
• Loss
• Destruction
• Pride
• Gluttony
• Ineptitude
• Concealment
• Bureaucracy
• Complexity
• Proc...
How people talk about data
• I put my data in figshare and I got a DOI for it
• Not our data; the university’s data; my
fu...
Data ownership – it’s messy
• You need ownership to make data free
• Governments may assert this
• Industrial collaborator...
ON METADATA
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 29
Disciplines – current state
• Typically specialised
• Focussed on discipline-specific concerns
• Frequently embedded – hen...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 31
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 32
Understanding Data Requirements
http://www.dcc.ac.uk/
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 33
Data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
• Returns on investment between 400% and
1200%
2014-05...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 35
Integrity
• Not everyone publishes
here
• Almost all fraud
connected to
unavailable data
• People suffer & die due
to rese...
Integrity – not without data
• Cyril Burt
– Twin studies on intelligence.
– Questioned 1976; now discredited
• Duke case
–...
Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 39
How to cite data
What data to keep
The Data Deluge is upon us
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 40
Sensor’s ability
to produce data
outstrips IT...
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 41
Roles and
Responsibilities
What data to keep
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 42
Excuses – and responses
• “People will ask questions”
– So use a data centre or repository
• “It will be misinterpreted”
–...
Should all data be open?
• NO
• Many reasons – most to do with human
subjects
• But data existence should always be open
•...
Kevin Ashley – Eurocris2014 -
CC-BY 45
Some conundrums
• Releasing genome data is OK when it’s:
– An identified human subj...
It’s amazing what people will share…
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 46
Data reuse from Hubble
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 47
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 48
Pimp your
data –
make it
findable &
reusable
2014-04-25 Kevin Ashley, DCC – SocSciScot14 - CC-BY 49
Gking.harvard.edu/data
Data is variable
• Not always textual
• Not always tabular
• Not always fixed – continual change
• Not always clearly auth...
Some messages for you
• Some things we need to know about data:
– When/where/what is it about?
– Who owns it
– What rights...
What about your data?
• If administrative data isn’t freely available,
why not?
• Expose it in bulk – not just as a web pa...
What about collaboration?
• Collaborate within the university
• Collaborate with partners
• Collaborate with regional, nat...
http://dataintelligence.3tu.nl/en/home/
Choice of RDM training
materials for librarians
Up-skilling
for data
http://datali...
My message to researchers
• The credit belongs to you
• The data belongs to all of us
• Share, and we all reap the
benefit...
Upcoming SlideShare
Loading in …5
×

My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

1,173 views

Published on

My keynote talk for Eurocris2014, Rome. I make the case for reuse of research data, discuss the barriers and look at ways we are trying to overcome them.

Published in: Data & Analytics, Technology
  • Be the first to comment

My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

  1. 1. My Data, Our Data, Your Data: data reuse through data management Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc
  2. 2. A summary • Why data reuse ? • What stops us ? • How data management helps • Harmonising the goals of research administration and research • Barriers again • The case for reuse - again 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 2
  3. 3. My home – the DCC • Mission – to increase capability and capacity for research data services in UK institutions • Not just a UK problem – an international one • Training, shared services, guidance, policy, standards, futures 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 3
  4. 4. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 4 What is data curation ? • “Maintaining, preserving and adding value to research data throughout its lifecycle” • More than preservation: – Active management – dealing with change • Less than preservation: – Lifecycle sometimes involves destruction
  5. 5. DCC guidance 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 5
  6. 6. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 6 SWEDEN DENMARK CANADA
  7. 7. Data reuse stories • The palaeontologist who saved years of work with archaeological data 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 7
  8. 8. What a paleontologist looks at 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 8 Now 100 million years ago 25m 50m 75m 1m
  9. 9. What a paleontologist looks at 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 9 Now 100 million years ago 25m 50m 75m 1mNow 1 million years 750,000500,000100,000
  10. 10. What an archaeologist looks at 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 10 Now 1 million years 750,000500,000100,000 100,000 years ago 75,000 50,00025,000
  11. 11. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 11
  12. 12. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 12 The Old weather project Data for research, not from research
  13. 13. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 13
  14. 14. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change • The ‘noise’ from research radar that mapped dust from Eyjafjallajökull 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 14
  15. 15. Data reuse - messages 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 15 Often your data tells stories that your publications do not Not all data comes from other researchers One person’s noise is another person’s signal Discipline-bounded data discovery doesn’t give us all we need or want
  16. 16. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 16 Why care? • Data is expensive – an investment • Reuse: – More research – Teaching & Learning – Planning • Impact – with or without publication • Accountability • Legal & regulatory requirements
  17. 17. Why does this matter? • Research quality – How close can we get to the truth? • Research speed – How quickly can we get to the truth? • Research finance – How much does the truth cost? • Improving one or more of these is of interest to all actors: • Researchers as data creators • Researchers as data reusers • Research institutions • Funders – hence government and society 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 17
  18. 18. G8UK - Endorses OA Open Data Charter Policy Paper 18 June 2013 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 18 G8UK - Billigt offenen Zugang Eine offene Daten Charter Strategiepapier.
  19. 19. Funder requirements • UK • USA – NSF, NEH, NIH • Europe • Most place burden on researcher – some on the institution 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 19 http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
  20. 20. RCUK policy - The 1-minute version • Research data are a public good – make openly available in timely & responsible way • Have policies & plans. Data with long-term value should be preserved & usable • Metadata for discovery & reuse. Link publications & data • Sometimes law, ethics get in the way. We understand. • Limited embargos OK. Recognition is important – always cite data sources • OK to use public money to do this. Do it efficiently. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 20
  21. 21. EPSRC policy points • Awareness of regulatory environment • Data access statement • Policies and processes • Data storage • Structured metadata descriptions • DOIs for data • Securely preserved for a minimum of 10 years from last use 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY Compliance expected by 2015
  22. 22. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 22 DCC Policy Summary http://www.dcc.ac.uk/resources/policy-and-legal
  23. 23. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 23 Findable, citable data has value • Important to link publications to data (and vice versa) • Increases citations – of data & publication • Increases reuse (hence value) • But effects exist even without publication, if data is: – Archived – Citable – Discoverable MORAL: build a data registry
  24. 24. What stops data reuse • Loss • Destruction • Pride • Gluttony • Ineptitude • Concealment • Bureaucracy • Complexity • Procrastination • Lack of potential 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 24
  25. 25. Kevin Ashley – Eurocris2014 - CC-BY 25 “Departments don’t have guidelines or norms for personal back-up and researcher procedure, knowledge and diligence varies tremendously. Many have experienced moderate to catastrophic data loss” Incremental Project Report, June 2010 http://www.flickr.com/photos/mattimattila/3003324844/ 2014-05-14
  26. 26. What stops data reuse • Loss • Destruction • Pride • Gluttony • Ineptitude • Concealment • Bureaucracy • Complexity • Procrastination • Lack of potential 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 26
  27. 27. How people talk about data • I put my data in figshare and I got a DOI for it • Not our data; the university’s data; my funder’s data; the data; the people’s data; your data. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 27
  28. 28. Data ownership – it’s messy • You need ownership to make data free • Governments may assert this • Industrial collaborators – understanding role of public funding • Research admin tracks the rules 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 28
  29. 29. ON METADATA 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 29
  30. 30. Disciplines – current state • Typically specialised • Focussed on discipline-specific concerns • Frequently embedded – hence processing required to expose independently • Historic failure to express generic concepts generically – Place – Time 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 30
  31. 31. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 31
  32. 32. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 32 Understanding Data Requirements http://www.dcc.ac.uk/
  33. 33. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 33
  34. 34. Data centres are good value! • See Jisc reports on ADS, BADC, UKDA: • Returns on investment between 400% and 1200% 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 34
  35. 35. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 35
  36. 36. Integrity • Not everyone publishes here • Almost all fraud connected to unavailable data • People suffer & die due to research fraud • When your research is reproducible – it gets cited 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 36
  37. 37. Integrity – not without data • Cyril Burt – Twin studies on intelligence. – Questioned 1976; now discredited • Duke case – Data hiding leads to wasted treatments, clinical trials, probable death & huge lawsuits • Dutch cases – Stapel – 55 publications – “fictitious data” – Poldermans – fabricated data or negligence? 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 37 “The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials “Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
  38. 38. Citability • Making data available increases citations • Everyone – academic, funder, institution – loves citations • Want evidence? – Alter, Pienta, Lyle – 240%, social sciences * – Piwowar, Vision – 9% (microarray data)† – Henneken, Accomazzi – 20% (astronomy) # 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 38 † Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 * Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 # Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
  39. 39. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 39 How to cite data What data to keep
  40. 40. The Data Deluge is upon us 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 40 Sensor’s ability to produce data outstrips IT’s ability to process it
  41. 41. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 41
  42. 42. Roles and Responsibilities What data to keep 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 42
  43. 43. Excuses – and responses • “People will ask questions” – So use a data centre or repository • “It will be misinterpreted” – Stuff happens. Also, openness encourages correction • “It’s not interesting” – Let others be the judge – your noise is my signal • “I might get another paper out of it” – Up to a point. We might get more research out of it • “I don’t have permission” – A real problem. But solvable at senior level • “It’s too bad/complicated” –see above • “It’s not a priority” – Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 43 See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
  44. 44. Should all data be open? • NO • Many reasons – most to do with human subjects • But data existence should always be open • Allows discovery & negotiation on use • Avoids pointless replication 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 44
  45. 45. Kevin Ashley – Eurocris2014 - CC-BY 45 Some conundrums • Releasing genome data is OK when it’s: – An identified human subject – An anonymous human subject – Your pet dog – Another mammal – An insect – A plant – A virus 2014-05-14
  46. 46. It’s amazing what people will share… 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 46
  47. 47. Data reuse from Hubble 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 47
  48. 48. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 48
  49. 49. Pimp your data – make it findable & reusable 2014-04-25 Kevin Ashley, DCC – SocSciScot14 - CC-BY 49 Gking.harvard.edu/data
  50. 50. Data is variable • Not always textual • Not always tabular • Not always fixed – continual change • Not always clearly authored – think of archival provenance • Not always associated with publication • Often with indistinct boundaries • Multi-dimensional and non-linear 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 50
  51. 51. Some messages for you • Some things we need to know about data: – When/where/what is it about? – Who owns it – What rights apply – What it is derived from & how – What software may be associated – What data management plan applies – How do I gain access ? – Where is it ? – When was/will it be destroyed? 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 51
  52. 52. What about your data? • If administrative data isn’t freely available, why not? • Expose it in bulk – not just as a web page • Gain the value from your overheads! 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52
  53. 53. What about collaboration? • Collaborate within the university • Collaborate with partners • Collaborate with regional, national services • Not everything can be done well locally • Some examples… 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 53
  54. 54. http://dataintelligence.3tu.nl/en/home/ Choice of RDM training materials for librarians Up-skilling for data http://datalib.edina.ac.uk/mantra/libtraining.html 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 54
  55. 55. My message to researchers • The credit belongs to you • The data belongs to all of us • Share, and we all reap the benefits 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 55

×