1. Scientific Software:
sustainability, skills & sociology
Neil Chue Hong, N.ChueHong@software.ac.uk
Director, Software Sustainability Institute
US/IAEA Workshop on Software Sustainability
for Safeguards Instrumentation, Vienna
Institute
Software
Sustainability
www.software.ac.uk
2. The Software Sustainability Institute
A national facility for cultivating world-
class research through software
• Better software enables better research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Developing the policy and tools to
support the community developing and
using research software
Supported by EPSRC
Grant EP/H043160/1
Institute
Software
Sustainability
www.software.ac.uk
3. Anatomy of my talk
Institute
Software
Sustainability
www.software.ac.uk
SOFTWAREis…
…areIMPORTANT
everywhere
hard to define
long-lived
context
reasons
people
7. Tamiflu binding to mutant influenza
A water-swap reaction coordinate for the calculation of
absolute protein-ligand binding free energies
Woods CJ, Malaisree M, Hannongbua S, Mulholland AJ
J. Chem. Phys. (2011) vol. 134, pp. 054114
http://dx.doi.org/10.1063/1.3519057
8. Favouring of disease risk alleles
Selection at pleiotropic loci underlies disease
co-occurrence in human populations.
Navarro, Haley, Karosas et al.
Submitted to Nature Genetics
9. Behind every great piece of science…
#go through each SNP of interest
for(my $x = 0; $x < scalar @pos; $x++)
{
#and then each downstream SNP of interest
for(my $y = $x+1; $y < scalar @pos; $y++)
{
#if SNPs within our chosen distance (500kb) and both present in the haplotypes file
if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArr
{
my $snp1ArrayPos = "”;
my $snp2ArrayPos = "”;
my $snp1All = "”;
my $snp2All = "”;
#create output file for this SNP pair
my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”;
print "$filenamen”;
unless (-e $filename) {
open(OUT, ">$filename");
#####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP####################
my $start = $pos[$y]-500000;
if ($start < 1) {
$start = 1;
}
my $end = $pos[$y]+500000;
if ($end > $chrLengths{$chr[$x]}) {
$end = $chrLengths{$chr[$x]};
}
10. Software is long-lived
(and outlasts computational hardware)
Institute
Software
Sustainability
www.software.ac.uk
12. Computational Chemistry - CASTEP
From the first implementation of a DFT algorithm to a
completely new code to community supported software
• Individual
• Group
• Consortium
• W/ industry
• Community
• Active
Software advances
< hardware speedup
13http://www.castep.org/
Institute
Software
Sustainability
www.software.ac.uk
13. LOTAR: storing aeronautical models
Life of CAD System: 10 years
Time between CAD Versions: 6 months
Life of Product: 70 years +
time
Production
CAD Obsolete
CAD Forgotten
Services
Legal Liability
Modifications
10 years 20 30 40 50 60
Spares
Image courtesy PDES Inc
Slide from Sean Barker, BAE SYSTEMS, DPC Designed to Last
Institute
Software
Sustainability
www.software.ac.uk
14. So we have to maintain it…
• “The modification of a software product after
delivery to correct faults, to improve
performance or other attributes, or to adapt the
product to a modified environment” – IEEE defn.
– Corrective maintenance: fixing faults
– Adaptive maintenance: adapting to changes in
environment
– Perfective maintenance: meeting new/different user
requirements
– Preventative maintenance: increasing maintainability
Institute
Software
Sustainability
www.software.ac.uk
15. … because we cannot change this
with process and practice alone …
• “Many of us have tried to discover
ways to prevent code from
becoming legacy. But …
prevention is imperfect. Even the
most disciplined development
team, knowing the best principles,
using the best patterns, and
following the best practices will
create messes from time to time.
The rot still accumulates. It’s not
enough to prevent the rot – you
have to be able to reverse it.”
Institute
Software
Sustainability
www.software.ac.uk
16. … so we work with what we have
• Identify change points
• Find test points
• Break dependencies
• Write tests
• Make changes and refactor
Testing, infrastructure, documentation are key
Institute
Software
Sustainability
www.software.ac.uk
17. Software is hard to define
(and thus hard to sustain)
Institute
Software
Sustainability
www.software.ac.uk
18. What do we sustain:
- Workflow?
- Software that runs workflow?
- Software referenced by workflow?
19. Novel reuse of public sector data
http://www.mysociety.org
What do we sustain:
- Map?
- Software that creates map?
22. Comb badge, Museum of London
• Without context, objects have no meaning
What’s this item?
32x28mm, lead alloy, late Medieval 14-15th century
23. What about repositories?
re⋅pos⋅i⋅tor⋅y
/noun/ [ri-poz-i-tawr-ee]
• 1. a receptacle or place where things are
deposited, stored, or offered for sale.
• 2. a burial place; sepulchre.
Institute
Software
Sustainability
www.software.ac.uk
24. The Zombie Effect
• Software not always fully alive
when you reanimate it!
• Complex set of dependencies
– Significant Properties of Software
– Purposes and benefits of
software preservation
http://www.jisc.ac.uk/media/documents/
programmes/preservation/significantprop
ertiesofsoftware-final.doc
http://softwarepreservation.jiscinvolve.org/wp/
25. Reasons are important
(so you take the right approach)
Institute
Software
Sustainability
www.software.ac.uk
26. Why are you considering
software sustainability?
Achieve legal compliance
Create heritage value
Enable continued access
to data and services
Encourage software reuse
Purpose
Institute
Software
Sustainability
www.software.ac.uk
27. How are you going to choose the right
approach?
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Transition (process-centric)
Hibernation (knowledge-centric)
Approach
Institute
Software
Sustainability
www.software.ac.uk
28. Preservation vs sustainability
Image courtesy of RGB Kew – not for reuse
Image courtesy of London Permaculture under CC-by-nc-sa license
Preservation?
Sustainability?
Institute
Software
Sustainability
www.software.ac.uk
30. Sustainable Communities
• Cohesion and Identity: Creating
a community
• Tolerance and Diversity: Smart
growth through collaboration
• Efficient use of resources:
Leveraging infrastructure
• Adaptability to change:
Governing sustainably
Institute
Software
Sustainability
www.software.ac.uk
31. Cultivate Contributors – R project
• Basics: Website, mailing list, code repository, issue
resolution
• Remove barriers to participation, increase efficiency
• 1993: First public release; 2 devs
• 1995: Code open sourced; 3 devs
• 1996: r-testers list set up
• 1997: lists split: r-announce, r-help,
r-devel; public CVS; 11 devs
• 2000: CRAN split and mirror
• 2001: BioConductor
• 2003: Namespaces
• 2005: I8n, L8n
• 2007: R-Forge
• Today: BioConductor (33 core devs),
R-Forge (532 projects, 1562 devs),
CRAN (1400+ packages)
34
http://cran.r-project.org/doc/html/interface98-paper/paper_2.html
Institute
Software
Sustainability
www.software.ac.uk
32. We under-appreciate training
• Basic training for
kitchen chef: 3-4 years
• Head chef: 10 years
• Basic training for s/w
engineer: 3-4 years
• Architect: 10 years
PhotobyZagatBuzz
• Training in S/W Dev in UG Physics: 140 hours
• Training in S/W Dev in UG Geography: 0 hours
Institute
Software
Sustainability
www.software.ac.uk
33. Software Carpentry
• Lab skills for scientific
computing
– http://software-carpentry.org
– International initiative to
teach basics of software
engineering to
researchers
• The “why” more than
the “how”
– We ran 13 workshops
in 2013 to 600+ learners
34. Incentives are important
Institute
Software
Sustainability
www.software.ac.uk
Courtesy of James Howison and James Herbsleb
Incentives and Integration In Scientific Software Production
Rewrite by original team:
address fragility
Fork to add specific functionality
Maintained separately
Optimised for
hardware
Facilitate hardware
sales
Exploit new techniques /
architectures
35. And money isn’t everything
Institute
Software
Sustainability
www.software.ac.uk
Funding/Staffing
Time
Next expt.
running
Experiment
Running
Analysis of
Data
New experiment
design starts
Maintenance of software
to process data from
physics experiment
36. So beware your bus factor
Institute
Software
Sustainability
www.software.ac.uk
37. Summary of my talk
Institute
Software
Sustainability
www.software.ac.uk
SOFTWAREis…
…areIMPORTANT
everywhere
hard to define
long-lived
context
reasons
people