Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Better Software, Better Research

38 views

Published on

Unconference on Software Sustainability in Denmark (Novo Nordisk Foundation)
25-26 March 2019, Favrholm Campus, Hillerod

Published in: Science
  • Be the first to comment

  • Be the first to like this

Better Software, Better Research

  1. 1. Better Software Better Research Carole Goble Software Sustainability Institute UK ELIXIR, ELIXIR-UK Head of Node The University of Manchester, UK carole.goble@manchester.ac.uk Unconference on Software Sustainability in Denmark (Novo Nordisk Foundation) 25-26 March 2019, Favrholm Campus, Hillerod
  2. 2. We produce lots of open source software used by other people over a long time… …different languages, dev communities, cultural norms and different licenses…. viewer
  3. 3. Users, Collaborators, Contributors developed in multiple consortia used by people who are not us (and sometimes redeveloped by strangers)
  4. 4. Open Source Software: • widespread use and adoption • contributions • citation, academic credit • funding partnerships http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88 2014
  5. 5. European Research Infrastructure for Life Sciences sustainable European infrastructure for biological information supporting life-science research and its translation to society, the bio- industries, environment and medicine. act global, think global FAIR Data for Life http://elixir-europe.org 23 Nodes, 220 organisations
  6. 6. European Research Infrastructure for Life Sciences http://elixir-europe.org
  7. 7. European Research Infrastructure for Life Sciences http://elixir-europe.org Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  8. 8. Bio.Tools Registries Packaging & Containers Clouds Integration WorkflowsBenchmarking Standards Software, Policy Best Practice Training http://elixir-europe.org Biohackathons 4OSSGuides What does ELIXIR do? ED AM
  9. 9. Biohackathon 2018 Bio.tools Galaxy Europe
  10. 10. The Software Sustainability Institute cultivating better, more sustainable, research software to enable world- class research seed an international movement act local think global Est 2010
  11. 11. The research community relies on software Do you use research software? What would happen to your research without software Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. 406 respondents covering representative range of funders, discipline and seniority.
  12. 12. The Research community produces software scientific software is important for their own research 91% developing scientific software is important for their own research 84% claimed to spend more time developing scientific software than they did 10 years ago 53% spend at least one fifth of their time developing software 38% 2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc. ICSE Workshop Software Eng. for Computational Science and Eng., 2009, pp. 1–8.
  13. 13. Investment across UK Research Councils into software use £840m Investment in 2013-2014 financial year, an amount that has risen by 3% on average over last four years 30% Of total research investment has been spent on research which relies on software over the last four financial years Analysis of data from 49,650 grant titles and abstracts published on Gateway to Research covering 2010-2014.
  14. 14. Software in research papers
  15. 15. Shared and sharable (data &) software key to reproducibility & productivity Improve transparency, understanding, trust Eliminate errors Encourage collaboration, Ease take up “Scholarship is the full software environment, code and data, that produced the result” - Claerbout
  16. 16. http://science.sciencemag.org/ content/314/5807/1856.full “Chang’s data are good… but the faulty software threw everything off” “a homemade data- analysis program had flipped two columns” Geoff Chang
  17. 17. recomputation.org sciencecodemanifesto.org Open Science, Software Policy, Digital Objects
  18. 18. Editorial | Published: 27 February 2019 Nature Methods 16, page207 (2019)
  19. 19. Hey, I found some great looking software ! I can’t get hold of it it doesn’t work for me it’s too hard to use. It doesn’t work with my tools. where is the documentation? the developers don’t have resources to help or don’t want to help or have gone. who else uses it? will it be maintained? can I trust it? I don’t want to be a software provider! I don’t have the time to document it or answer queries It’s really bad code I only made it for me I won’t be able to keep up to date my supervisor won’t let me, Its my special sauce Yeah, so I used my software in a paper……and now people want it
  20. 20. Culture change is hard Stodden, Seiler, Ma. An empirical analysis of journal policy effectiveness for computational reproducibility, PNAS March 13, 2018. 115 (11) 2584-2589; https://doi.org/10.1073/pnas.1708290115 “Thank you for your interest in our paper. For the [redacted] calculations I used my own code, and there is no public version of this code, which could be downloaded. Since this code is not very user- friendly and is under constant development I prefer not to share this code.” Since 2011 code must be available
  21. 21. I didn’t know about it I like to invent my own wheels Faster for me to code my own I only get funding for making new software I’m not funded or rewarded for reusing I don’t trust others software Its what is fun about my job!, Its how I’ll learn I’ve no time or capacity to take it on Yeah, so there is some software I could reuse … how do I ….get it be widely used? have folks contribute to it? make it sustainable? get folk who use it credit me? make it usable by more folk than me? Get the time and money to make it FAIR? Hey, I have some great software !
  22. 22. Not fit for take-on…needs help, guides, documentation, manuals, examples, content, portability, migration / legacy support, easy installation, virtual machines, testing, stability, version control, release cycle, roadmap, sustainability prospect, way of introducing or integrating my favourite, component/data/environment, documented and managed dependencies. Don’t know how Too Risky Not good enough
  23. 23. [Norman Morrison] Software Stewardship Debt
  24. 24. Barriers to Sharing Victoria Stodden,AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4) Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
  25. 25. Software is the infrastructureFree software is not Free. Like Free puppies. Tell your PIs And funders [Scott McNealy, 2005] http://www.zdnet.com/open-source-is-free-like-a- puppy-is-free-says-sun-boss-3039202713/ s/w engineers are cute too
  26. 26. Software is not all the same Not all software is valued the same way Not all software should be sustained Nangia and Katz: https://arxiv.org/pdf/1706.06527.pdf January – March 2016, 173 pieces of software mentioned in 32 papers Is it key to re-computing results? Could it be reused? Is it more than a one run shot? Is it obsolete? Does anybody care about it?
  27. 27. Software Ecosystem Patchworks and Spectrums Not all software is equal and worth sustaining. Its all worth being good. Invisible Domain generic Visible Domain specific Tools Services Workflows ScriptsLibraries Frameworks platforms Teams Individuals
  28. 28. Software Ecosystem Patchworks and Spectrums Not all software is equal and worth sustaining. Its all worth being good. Intentional Side-effect Full fledged for reuse Throw- away Code Algorithm
  29. 29. Adoption intentions Adoptive community Control intentions Contribution intentions Producer Consumer Incidental Familial Fundamental Me Family & Friends Strangers Self Collaborate Contribute Incidental Cooperative Autocratic Software Intentions
  30. 30. Software Ecosystem All software is “legacy code”. Maintenance = Evolution. If it’s used it will evolve Sustain the form Reproducibility by Inspection Read It, Maintain It Sustain the function Reproducibility by Invocation Port it, Run It, Preserve it ED AM service
  31. 31. Describe computational workflows to be portable, scalable & interoperable with different workflow systems and containerised tools Description of tools, inputs and outputs. Ontology markup using EDAM and bioschemas. CWL files in GitHub Export from native platforms Bundle the CWL workflow descriptions + rich context, provenance using multi-tiered descriptions Snapshot workflow. Relate it to other objects. Software components are containerised
  32. 32. Five steps to better software better research Get and develop Expert Help Publish code Get and give credit Develop a Software Management Plan Code, document and deploy for Strangers Get and offerTraining
  33. 33. Advice everywhere …. https://github.com/SoftDev4LS
  34. 34. provenance portability good enough practices access documentation adopt a licence make it discoverable make source code accessible respect 3rd party licenses version your releases document well use citation metadata validation docs provide test data provide example data use version control, use automated build and test, have code reviews, modularise, use community standards, be your own user don't reinvent the wheel, make common operations easy to control, design for maintainability have clear and transparent contribution, governance and communication processes use package managers and containers do not require special privileges to install or run eliminate hard-coded paths log parameters and versions dependencies …in a nutshell… ids steps
  35. 35. …maintainability & maturity…. Maintainability Checklist https://software.ac.uk/resources/guides/developing-maintainable-software Can I make a change with only a low risk of breaking existing features? Corrective -fixing faults Preventative - increasing maintainability Adaptive - adapting to changes in environment Perfective - meeting new/different user requirements Keeping the Show on the Road Dealing with change
  36. 36. People say they want flexibility. They prefer the simplicity of order and will adapt to adopt Don’t tweak standards or standard systems "it's better, initially, to make a small number of users really love you than a large number kind of like you" Paul Buchheit paulbuchheit.blogspot.com Do not underestimate the power of the sprint / *-athon KISS A good interface beats out most things Beware the Developer Egoist…
  37. 37. SSI Survey of researchers from 15 RussellGroup universities conducted by SSI between August - October 2014. 406 respondents covering representative range of funders, discipline and seniority. 56% Of UK researchers develop their own research software or scripts 73% Of UK researchers have had no formal software engineering training 140K UK researchers rely on their own coding skills Training 47% Of scientists have a good understanding of software testing 34% Of scientists think that formal training in developing software is important Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute. J.E. Hannay et al., “How DoScientists Develop and Use Scientific Software?” Proc. ICSEWorkshop Software Eng. for Computational Science and Eng., 2009, pp. 1–8. 2000 scientists
  38. 38. Basic training for kitchen chef: 3-4 years Head chef: 10 years Basic training for s/w engineer: 3-4 years Architect: 10 years PhotobyZagatBuzz Training in S/W Dev in UG Physics: 140 hours Training in S/W Dev in UG Geography: 0 hours Institute Software Sustainability
  39. 39. Training the 95% • Software, Data, Library Carpentry • teach foundational computational and data science skills to researchers • communities of instructors, trainers, maintainers, helpers, and supporters • train researchers, train the trainers 1st European CarpentryConnect Manchester UK, 25-27 June 2019 https://carpentries.org/ 4500 researchers 140 workshops 137 instructors 15TtT workshops 227 instructors 12 nodes
  40. 40. https://coderefinery.org/
  41. 41. Expert help – open call Biomolecular systems and protein modelling codes BoneJ: suite of open- source plug-ins for bone shape analysis based on ImageJ Community assessment and building Improved testing f/work Packaging and installation Improved coding standards Improved web site Community web portal ionomic data on over 300,000 plant and yeast samples Rehosted service Migration of portal from Purdue to Nottingham Technical analysis of the service + a migration process Changes to ensure the long-term sustainability User assessment Re-architect and scale One-man, small-scale software project into multi-developer programme ChrisWood David SaltMichael Doube
  42. 42. Expert help – A community of fellows • Career Building • Championing, Influencing • Topic specific workshops • Annual CollaborationsWorkshop 1-3 April 2019 https://www.software.ac.uk/cw19 112 Fellows
  43. 43. Scaling Expert help – Campaigning for careers & Professionalisation of research software est 2012 at a SSI Collaborations Workshop http://rse.ac.uk
  44. 44. Make a worldwide movement www.de-rse.org https://rse.ac.uk/conf2019/ University of Birmingham, 17-19 September 2019. 1500 members
  45. 45. Get a plan and publish… Developed and versioned using code repository Published via code repository or website Registered for discovery Citation metadata Deposited in digital repository with paper / for preservation develop share preserve CodeMeta bio.tools
  46. 46. Campaign for Software Recognition J. Howison and J. Bullard. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. J AIST 015. http://dx.doi.org/10.1002/asi.23538 7 different ways software mentioned 18% offered preferred citation 32% who cited ignored it 90 biology articles Credit is like not $$$$$ Secret credit = no credit = no sustainability 24% journals had a citation policy
  47. 47. [1960s Boeing 747-100 Software Configuration] http://scienceblogs.com/pontiff/2008/05/27/the-weight-of-software/ especially software that is widely used, infrastructural, components or cross- discipline Invisibility Scholarly value when a means to an end and when an end in itself
  48. 48. Means for Software Recognition https://cite.research-software.org/ Principles Metadata Guidelines Citation File Format (CFF) CodeMeta.json DataCite Metadata Schema v4.1 Force11 Software citation principles https://peerj.com/articles/cs-86/ When and how should I cite? How do I deal with components and teams? Can there be transitive or fractional credit? How do I cite versions? Be a better reviewer Tools? Dan Katz Talk: https://doi.org/10.6084/m9.figshare.7054478.v1
  49. 49. Personal Responsibility A Manifesto for Personal Responsibility in the Engineering of Academic Software A. Recognition of academic software B. Academic software development processes C. The intellectual content of academic softwarehttps://www.dagstuhl.de/16252 June 19 – 24 , 2016, Dagstuhl Perspectives Workshop 16252
  50. 50. A. Recognition of academic software 1. I will properly cite software used to produce my research results. 2. I will point out improper or missing citations to software when I am reviewing publications. 3. I will make explicit how to cite the software I make available. 4. I will recommend software experts for funding agencies to include in their review processes. 5. I will invite developers of software that enables my research to be co-authors on my papers. 6. I will recognize software contributions in hiring and promotion within my institution. 7. I will recognize software contributions at conferences, e.g. dedicated sessions, and prizes. 8. I will support and publish in journals that recognise software contributions. 9. I will contribute to sustaining the software I rely on for my research. B. Academic software development processes 10. I will develop software as open source right from the start whenever possible. 11. I will document my academic software for users with instructions and examples. 12. I will package, release and archive versions of my software. 13. I will consider and document the sustainability of my research software. 14. I will publish how I organize and run my software projects. 15. I will match software engineering practices I recommend to the needs and resources of projects. 16. I will help scientists improve the quality of their software without passing judgment. C.The intellectual content of academic software 17. I will acknowledge that source code is a legitimate part of the academic discourse 18. I will publish the intellectual contributions of my research software. 19. I will distinguish the intellectual contribution of my software from its service contribution. 20. I will examine the source code of academic software contributions and encourage others to do so as well.
  51. 51. Take personal responsibility for FAIR Software Don’t wait for funders and policy makers and publishers to catch up.
  52. 52. Start by filling out this survey! https://goo.gl/forms/dOT4RrgyK5NEqvhG3 https://blog.codeforscience.org/identifying-systemic-challenges-to-the-sustainability-of-data- driven-tooling
  53. 53. Talk Acknowledgements All my colleagues at SSI since 2010 All my colleagues in ELIXIR Fellow Dagstuhl attendees Special thanks: SSI: Neil Chue Hong, Simon Hettrick, Steve Crouch,Aleks Nenadic, Raniere Silva, Shoaib Sufi, Caroline Jay, David De Roure, Les Carr, Aleks Pawlik SSI fellows: Mike Crouch, Rob Haines Manchester colleagues: Stian Soiland-Reyes, IanCottam ELIXIR: MichaelCrusoe, Björn Grüning, Frederik Coppens, Rob Finn, Salvador Capella Colleagues: Tim Clark, Dan Katz, James Howison, Kristian Garza
  54. 54. Funder Acknowledgements European Union Horizon 2020 program under grant agreement 676559 Implementation Studies CWL and Bioschemas European Union Horizon 2020 program under grant agreement 675728. European Union Horizon 2020 program under grant agreement 654248. European Union Horizon 2020 program under grant agreement 739563.

×