1. Better Software
Better Research
Carole Goble
Software Sustainability Institute UK
ELIXIR, ELIXIR-UK Head of Node
The University of Manchester, UK
carole.goble@manchester.ac.uk
Unconference on Software Sustainability in Denmark (Novo Nordisk Foundation)
25-26 March 2019, Favrholm Campus, Hillerod
2. We produce lots of open source software
used by other people over a long time…
…different languages, dev communities, cultural
norms and different licenses….
viewer
4. Open Source Software:
• widespread use and
adoption
• contributions
• citation, academic credit
• funding partnerships
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
2014
5. European Research Infrastructure
for Life Sciences
sustainable European infrastructure
for biological information
supporting life-science research and
its translation to society, the bio-
industries, environment and
medicine.
act global, think global
FAIR Data for Life
http://elixir-europe.org
23 Nodes, 220 organisations
8. Bio.Tools
Registries Packaging & Containers
Clouds
Integration WorkflowsBenchmarking
Standards Software, Policy
Best Practice
Training
http://elixir-europe.org
Biohackathons
4OSSGuides
What does ELIXIR do?
ED
AM
10. The Software Sustainability Institute
cultivating better, more
sustainable, research
software to enable world-
class research
seed an international
movement
act local think global
Est 2010
13. The research community
relies on software
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October
2014. 406 respondents covering representative range of funders, discipline and seniority.
14. The Research community
produces software
scientific software is important
for their own research
91%
developing scientific software is important
for their own research
84%
claimed to spend more time developing
scientific software than they did 10 years ago
53%
spend at least one fifth of their time
developing software
38%
2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc.
ICSE Workshop Software Eng. for Computational Science and Eng., 2009, pp. 1–8.
15. Investment
across UK
Research
Councils into
software use
£840m
Investment in 2013-2014 financial
year, an amount that has risen by 3%
on average over last four years
30%
Of total research investment has
been spent on research which relies
on software over the last four
financial years
Analysis of data from 49,650 grant titles and abstracts published on Gateway to Research covering 2010-2014.
17. Shared and sharable (data &) software
key to reproducibility & productivity
Improve transparency, understanding, trust
Eliminate errors
Encourage collaboration, Ease take up
“Scholarship is the full
software environment, code
and data, that produced
the result” - Claerbout
21. Hey, I found some great looking software !
I can’t get hold of it
it doesn’t work for me
it’s too hard to use. It doesn’t work with my
tools. where is the documentation?
the developers don’t have resources to help
or don’t want to help or have gone.
who else uses it? will it be maintained?
can I trust it?
I don’t want to be a software provider!
I don’t have the time to document it or answer queries
It’s really bad code
I only made it for me
I won’t be able to keep up to date
my supervisor won’t let me, Its my special sauce
Yeah, so I used my software in a
paper……and now people want it
22. Culture change is hard
Stodden, Seiler, Ma. An empirical analysis of journal policy effectiveness for computational reproducibility, PNAS March 13, 2018. 115 (11)
2584-2589; https://doi.org/10.1073/pnas.1708290115
“Thank you for your interest in our paper. For the [redacted]
calculations I used my own code, and there is no public version of this
code, which could be downloaded. Since this code is not very user-
friendly and is under constant development I prefer not to share this
code.”
Since 2011 code
must be available
23. I didn’t know about it
I like to invent my own wheels
Faster for me to code my own
I only get funding for making new software
I’m not funded or rewarded for reusing
I don’t trust others software
Its what is fun about my job!, Its how I’ll learn
I’ve no time or capacity to take it on
Yeah, so there is some software I could reuse …
how do I ….get it be widely used?
have folks contribute to it?
make it sustainable?
get folk who use it credit me?
make it usable by more folk than me?
Get the time and money to make it FAIR?
Hey, I have some great software !
25. Not fit for take-on…needs
help, guides, documentation,
manuals, examples, content,
portability, migration / legacy support,
easy installation, virtual machines,
testing, stability, version control,
release cycle, roadmap, sustainability
prospect, way of introducing or
integrating my favourite,
component/data/environment,
documented and managed
dependencies.
Don’t
know how
Too Risky
Not good
enough
27. Barriers to Sharing
Victoria Stodden,AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
28. Software is the infrastructureFree
software is
not Free.
Like Free
puppies.
Tell your PIs
And funders
[Scott McNealy, 2005]
http://www.zdnet.com/open-source-is-free-like-a-
puppy-is-free-says-sun-boss-3039202713/
s/w engineers are cute too
29. Software is not all the same
Not all software is valued the same way
Not all software should be sustained
Nangia and Katz:
https://arxiv.org/pdf/1706.06527.pdf
January – March 2016,
173 pieces of software mentioned in 32 papers
Is it key to re-computing results?
Could it be reused?
Is it more than a one run shot?
Is it obsolete?
Does anybody care about it?
30. Software Ecosystem
Patchworks and Spectrums
Not all software is equal and worth sustaining. Its all worth being good.
Invisible
Domain
generic
Visible
Domain
specific
Tools
Services
Workflows
ScriptsLibraries
Frameworks
platforms
Teams Individuals
31. Software Ecosystem
Patchworks and Spectrums
Not all software is equal and worth sustaining. Its all worth being good.
Intentional Side-effect
Full fledged
for reuse
Throw-
away
Code Algorithm
33. Software Ecosystem
All software is “legacy code”. Maintenance = Evolution. If it’s used it will evolve
Sustain the form
Reproducibility by
Inspection
Read It, Maintain It
Sustain the function
Reproducibility by
Invocation
Port it, Run It, Preserve it
ED
AM
service
34. Describe computational workflows to be
portable, scalable & interoperable with different
workflow systems and containerised tools
Description of
tools, inputs
and outputs.
Ontology
markup using
EDAM and
bioschemas.
CWL files in GitHub
Export from native
platforms
Bundle the CWL
workflow descriptions +
rich context, provenance
using multi-tiered
descriptions
Snapshot workflow.
Relate it to other
objects.
Software
components are
containerised
35. Five steps to
better software better research
Get and develop
Expert Help
Publish code
Get and give credit
Develop a Software
Management Plan
Code, document
and deploy for
Strangers
Get and offerTraining
37. provenance
portability
good enough practices
access documentation
adopt a licence
make it discoverable
make source code accessible
respect 3rd party licenses
version your releases
document well
use citation metadata
validation docs
provide test data
provide example data
use version control, use automated build and test,
have code reviews, modularise, use community standards, be your own user
don't reinvent the wheel, make common operations easy to control, design for maintainability
have clear and transparent contribution, governance and communication processes
use package managers and containers
do not require special privileges to install or run
eliminate hard-coded paths
log parameters and versions
dependencies
…in a nutshell…
ids steps
38. …maintainability & maturity….
Maintainability Checklist
https://software.ac.uk/resources/guides/developing-maintainable-software
Can I make a change with only a low risk of breaking existing
features?
Corrective -fixing faults
Preventative - increasing maintainability
Adaptive - adapting to changes in environment
Perfective - meeting new/different user
requirements
Keeping the Show
on the Road
Dealing with
change
39. People say they want flexibility. They prefer the
simplicity of order and will adapt to adopt
Don’t tweak
standards or
standard systems
"it's better, initially, to make a small number of users
really love you than a large number kind of like you"
Paul Buchheit
paulbuchheit.blogspot.com
Do not underestimate the
power of the sprint /
*-athon
KISS
A good interface beats out
most things
Beware the Developer Egoist…
40. SSI Survey of researchers from 15 RussellGroup universities
conducted by SSI between August - October 2014. 406
respondents covering representative range of funders,
discipline and seniority.
56%
Of UK researchers develop
their own research
software or scripts
73%
Of UK researchers
have had no
formal software
engineering
training
140K UK researchers rely
on their own coding
skills
Training
47%
Of scientists have
a good
understanding of
software testing
34%
Of scientists think
that formal training
in developing
software is
important
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a
Computational science: ...Error…why scientific programming does
not compute.
J.E. Hannay et al., “How DoScientists Develop and Use Scientific
Software?” Proc. ICSEWorkshop Software Eng. for
Computational Science and Eng., 2009, pp. 1–8. 2000 scientists
41. Basic training for kitchen
chef: 3-4 years
Head chef: 10 years
Basic training for s/w
engineer: 3-4 years
Architect: 10 years
PhotobyZagatBuzz
Training in S/W Dev in UG Physics: 140 hours
Training in S/W Dev in UG Geography: 0 hours
Institute
Software
Sustainability
42. Training the 95%
• Software, Data, Library Carpentry
• teach foundational computational
and data science skills to researchers
• communities of instructors, trainers,
maintainers, helpers, and supporters
• train researchers, train the trainers
1st European CarpentryConnect
Manchester UK, 25-27 June 2019
https://carpentries.org/
4500 researchers
140 workshops
137 instructors
15TtT workshops
227 instructors
12 nodes
44. Expert help – open call
Biomolecular
systems and
protein
modelling
codes
BoneJ: suite of open-
source plug-ins for bone
shape analysis based on
ImageJ
Community assessment
and building
Improved testing f/work
Packaging and installation
Improved coding standards
Improved web site
Community web portal
ionomic data on over
300,000 plant and yeast
samples
Rehosted service
Migration of portal from
Purdue to Nottingham
Technical analysis of the
service + a migration
process
Changes to ensure the
long-term sustainability
User assessment
Re-architect and scale
One-man, small-scale
software project into
multi-developer
programme
ChrisWood
David SaltMichael Doube
45. Expert help –
A community of fellows
• Career Building
• Championing, Influencing
• Topic specific workshops
• Annual CollaborationsWorkshop
1-3 April 2019
https://www.software.ac.uk/cw19
112 Fellows
46. Scaling Expert help –
Campaigning for careers &
Professionalisation of research software
est 2012 at a SSI Collaborations Workshop
http://rse.ac.uk
47. Make a worldwide movement
www.de-rse.org
https://rse.ac.uk/conf2019/
University of Birmingham,
17-19 September 2019.
1500 members
48. Get a plan
and publish…
Developed and
versioned using
code repository
Published via
code repository or website
Registered for discovery
Citation metadata
Deposited in
digital repository
with paper /
for preservation
develop share preserve
CodeMeta
bio.tools
49. Campaign for Software Recognition
J. Howison and J. Bullard. Software in the scientific literature: Problems with seeing, finding, and using software
mentioned in the biology literature. J AIST 015. http://dx.doi.org/10.1002/asi.23538
7 different ways software
mentioned
18% offered preferred citation
32% who cited ignored it
90 biology articles
Credit is like not $$$$$
Secret credit = no credit = no sustainability
24% journals had a
citation policy
50. [1960s Boeing 747-100 Software Configuration]
http://scienceblogs.com/pontiff/2008/05/27/the-weight-of-software/
especially software that is widely used,
infrastructural, components or cross-
discipline
Invisibility
Scholarly value
when a means to an end and when an end in itself
51. Means for Software Recognition
https://cite.research-software.org/
Principles
Metadata
Guidelines
Citation File Format (CFF)
CodeMeta.json
DataCite Metadata Schema v4.1
Force11 Software citation principles
https://peerj.com/articles/cs-86/
When and how should I cite?
How do I deal with components and teams?
Can there be transitive or fractional credit?
How do I cite versions?
Be a better reviewer
Tools?
Dan Katz Talk: https://doi.org/10.6084/m9.figshare.7054478.v1
52. Personal Responsibility
A Manifesto for Personal
Responsibility in the
Engineering of Academic
Software
A. Recognition of academic
software
B. Academic software
development processes
C. The intellectual content
of academic softwarehttps://www.dagstuhl.de/16252
June 19 – 24 , 2016, Dagstuhl Perspectives
Workshop 16252
53. A. Recognition of academic software
1. I will properly cite software used to produce my research results.
2. I will point out improper or missing citations to software when I am reviewing publications.
3. I will make explicit how to cite the software I make available.
4. I will recommend software experts for funding agencies to include in their review processes.
5. I will invite developers of software that enables my research to be co-authors on my papers.
6. I will recognize software contributions in hiring and promotion within my institution.
7. I will recognize software contributions at conferences, e.g. dedicated sessions, and prizes.
8. I will support and publish in journals that recognise software contributions.
9. I will contribute to sustaining the software I rely on for my research.
B. Academic software development processes
10. I will develop software as open source right from the start whenever possible.
11. I will document my academic software for users with instructions and examples.
12. I will package, release and archive versions of my software.
13. I will consider and document the sustainability of my research software.
14. I will publish how I organize and run my software projects.
15. I will match software engineering practices I recommend to the needs and resources of projects.
16. I will help scientists improve the quality of their software without passing judgment.
C.The intellectual content of academic software
17. I will acknowledge that source code is a legitimate part of the academic discourse
18. I will publish the intellectual contributions of my research software.
19. I will distinguish the intellectual contribution of my software from its service contribution.
20. I will examine the source code of academic software contributions and encourage others to do so as well.
54. Take personal responsibility for
FAIR Software
Don’t wait for funders and policy
makers and publishers to catch up.
55. Start by filling out this survey!
https://goo.gl/forms/dOT4RrgyK5NEqvhG3
https://blog.codeforscience.org/identifying-systemic-challenges-to-the-sustainability-of-data-
driven-tooling
56. Talk Acknowledgements
All my colleagues at SSI since 2010
All my colleagues in ELIXIR
Fellow Dagstuhl attendees
Special thanks:
SSI: Neil Chue Hong, Simon Hettrick, Steve Crouch,Aleks Nenadic, Raniere Silva,
Shoaib Sufi, Caroline Jay, David De Roure, Les Carr, Aleks Pawlik
SSI fellows: Mike Crouch, Rob Haines
Manchester colleagues: Stian Soiland-Reyes, IanCottam
ELIXIR: MichaelCrusoe, Björn Grüning, Frederik Coppens, Rob Finn, Salvador
Capella
Colleagues: Tim Clark, Dan Katz, James Howison, Kristian Garza
57. Funder
Acknowledgements
European Union Horizon 2020 program under
grant agreement 676559
Implementation Studies
CWL and Bioschemas
European Union Horizon 2020 program
under grant agreement 675728.
European Union Horizon 2020 program
under grant agreement 654248.
European Union Horizon 2020 program
under grant agreement 739563.