Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

342 views

Published on

“Open Research Data: Implications for Science and Society”, Warsaw, Poland, May 28–29, 2015, conference organized by the Open Science Platform — an initiative of the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw. pon.edu.pl @OpenSciPlatform #ORD2015

Published in: Science
  • Be the first to comment

Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

  1. 1. Open data: Benefits for the researcher, Benefits for Society Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc
  2. 2. A summary • Why data reuse ? • What stops us ? • Related issues – software & methods • The case for reuse - again 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 2
  3. 3. An alternative summary Being Selfish 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 3 What’s possible now … and still benefiting others Being Just Good Enough Thanks to: Neil Chue Hong (@npch), Software Sustainability Institute ORCID: 0000-0002-8876-7606 David Flanders (@dfflanders), Dr Steven Manos (DrStevenManos) University of Melbourne. All my colleagues at the DCC Cameron Neylon (@CameronNeylon)
  4. 4. My home – the DCC • Mission – to increase capability and capacity for research data services in UK institutions • Not just a UK problem – an international one • Training, shared services, guidance, policy, standards, futures 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 4
  5. 5. DATA REUSE HAPPENS – AND NOT ALWAYS IN THE WAY YOU EXPECT 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 5
  6. 6. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 6 The Old weather project Data for research, not from research
  7. 7. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change • The ‘noise’ from research radar that mapped dust from Eyjafjallajökull 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 7
  8. 8. Should all data be open? • NO • Many reasons – most to do with human subjects • But data existence should always be open • Allows discovery & negotiation on use • Avoids pointless replication 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 8
  9. 9. Kevin Ashley –ORD2015 - CC- BY 9 Some conundrums • Releasing genome data is OK when it’s: – An identified human subject – An anonymous human subject – Your pet dog – Another mammal – An insect – A plant – A virus 2015-05-28
  10. 10. Data reuse - messages 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 10 Often your data tells stories that your publications do not Not all data comes from other researchers One person’s noise is another person’s signal Discipline-bounded data discovery doesn’t give us all we need or want
  11. 11. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 11 Why care? • Data is expensive – an investment • Reuse: – More research – Teaching & Learning – Planning • Impact – with or without publication • Accountability • Legal & regulatory requirements
  12. 12. Why does this matter? • Research quality – How close can we get to the truth? • Research speed – How quickly can we get to the truth? • Research finance – How much does the truth cost? • Improving one or more of these is of interest to all actors: • Researchers as data creators • Researchers as data reusers • Research institutions • Funders – hence government and society 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 12
  13. 13. G8UK - Endorses OA Open Data Charter Policy Paper 18 June 2013 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 13
  14. 14. 164 universities in UK* *2011 HESA data 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 14 71 (43%) > 5% research income 115 (70%) > £1m income from research
  15. 15. £4.4 billion total research grants =~PLN 26.6 billion 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 15
  16. 16. Business case for UK investment in data reuse • National infrastructure costs £1.5m/year • 5 years before data reuse is fully active • 10,000 datasets per year captured • 1 in 100 datasets reused each year • £30,000 saved each time data is reused • Saving: £3m/year – twice the running cost 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 16
  17. 17. http://www.flickr.com/photos/sethw/113073189/ 95% of research results are never published Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 17
  18. 18. http://flickr.com/photos/heymans/480396810/ If a million postdocs repeat a million experiments… Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 18
  19. 19. http://flickr.com/photos/cliche/120070310/ And 25% of those don’t work… Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 19
  20. 20. …how much taxpayer’s money is that? http://flickr.com/photos/luismimunoznajar/2093185804/ Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 20
  21. 21. More benefits: patient safety 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 21
  22. 22. … and institutional reputation 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 22
  23. 23. BUT WHAT ABOUT ME BEING SELFISH? 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 23
  24. 24. Funders are making demands 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 24
  25. 25. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 25 Findable, citable data has value • Important to link publications to data (and vice versa) • Increases citations – of data & publication • Increases reuse (hence value) • But effects exist even without publication, if data is: – Archived – Citable – Discoverable • All benefit – researcher; institution; publisher
  26. 26. Citability • Making data available increases citations • Everyone – academic, funder, institution – loves citations • Want evidence? – Alter, Pienta, Lyle – 240%, social sciences * – Piwowar, Vision – 9% (microarray data)† – Henneken, Accomazzi – 20% (astronomy) # 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 26 † Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 * Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 # Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
  27. 27. Traditional skills can win • Google Flu gets it wrong: • Laze, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Science, 343, Forthcoming. • The data tells us why: • Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro, 2014, "Replication data for: The Parable of Google Flu: Traps in Big Data Analysis", http://dx.doi.org/10.7910/DVN/24823 UNF:5:BJh9WzZQNEeSEpV3EWs+xg== IQSS Dataverse Network [Distributor] V1 [Version] • Personalisation ; suggested searches; other UI changes 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 27 41 datasets – none bigger than 1 Mbyte Data made available before paper was published – result was immediate impact
  28. 28. What stops data reuse • Loss • Destruction • Pride • Gluttony • Ineptitude • Concealment • Bureaucracy • Complexity • Procrastination • Lack of potential 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 28
  29. 29. Excuses – and responses • “People will ask questions” – So use a data centre or repository • “It will be misinterpreted” – Stuff happens. Also, openness encourages correction • “It’s not interesting” – Let others be the judge – your noise is my signal • “I might get another paper out of it” – Up to a point. We might get more research out of it • “I don’t have permission” – A real problem. But solvable at senior level • “It’s too bad/complicated” –see above • “It’s not a priority” – Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 29 See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
  30. 30. Why open software? • Quicker start • Better flexibility • Improved robustness • Increases collaborators • Greater research impact • Easier to work with industry • No added cost – Caveat: over what you should already be doing 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 30
  31. 31. What’s software got to do with my research? 31 Slide: Neil Chue Hong
  32. 32. The research community relies on software Do you use research software? What would happen to your research without software Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. DOI: 10.5281/zenodo.14809 56% Develop their own software 71% Have no formal software training2015-05-28 Kevin Ashley –ORD2015 - CC-BY 32 Slide: Neil Chue Hong
  33. 33. The modern researcher… • … worries about: – Data management and analysis – Reproducible research – Scalable simulations – Integration of models and workflows – Collaboration Picture of Otto Stern from Emilio Segre Visual Archives. Copyright American Institute of Physics. Used with permission 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 33 Slide: Neil Chue Hong
  34. 34. Open software is good for science and good for you • Benefits – More collaborators – More citations – More benefit to others – Increased robustness – Increased reuse – Reduced replication of effort • Far more than the drawbacks – More structured collaboration 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 34 Slide: Neil Chue Hong
  35. 35. Improve your research impact Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-05-28 Kevin Ashley –ORD2015 - CC-BY 35 Slide: Neil Chue Hong
  36. 36. , it’ Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4) Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 36
  37. 37. software engineering (vs) software/data carpentry Software carpenters craft their research atop the digital infrastructure to produce novel science. Software engineers maintain, own and operate digital infrastructure. Teaching researchers to code Community exemplar: #SWCarpentry F
  38. 38. Publishing data & software papers is easy http://openresearchsoftware.metajnl.com http://bit.ly/softwarejournals http://dx.doi.org/10.6084/m9.figshare.942289 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 38 Slide: Neil Chue Hong
  39. 39. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 39
  40. 40. THERE’S HELP FOR DATA SHARING AS WELL 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 40
  41. 41. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 41
  42. 42. Roles and Responsibilities What data to keep 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 42
  43. 43. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 43 How to cite data What data to keep
  44. 44. Acquire research data skills 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 44
  45. 45. Finally… • Sharing data is good for you • It’s good for all of us • It isn’t as hard as you think – start today! 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 45
  46. 46. It’s amazing what people will share… 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 46
  47. 47. Data reuse from Hubble 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 47

×