Where does it go from here? The role of software in digital repositories
The open repositories community has made great strides in recent years in addressing interoperability, policy and providing the arguments for open access and sharing. One aspect of open research which has come to prominence is the importance of software as a fundamental part of reproducible research, which in turn raises issues around the preservation of software.
In this short presentation, I will describe some of the work that the Software Sustainability Institute (SSI) has been doing to address the structural and policy issues which currently present a barrier to the deposit and use of software in open repositories.
Steven Gray here at CASA has produced a proof of concept showing the last hours snow fall in the UK as Tweets and the last 24 in postcode districts (the important part here is the data underneath, not the Tweets as such)Based on Ben Marsh’s work.
I ended up doing this because we needed to fix the basics:Reproducible researchSoftware credit / career pathsSoftware skillsDrawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchersProviding services for research software users and developersDeveloping research community interactions and capacityPromoting research software best practice and capability
Clarifying the Purposes and Benefits of Software Preservation: http://softwarepreservation.jiscinvolve.org/wp/about/
There is a spectrum of approaches
Statistics from Greg WilsonAre academics software developers?Can research consortia manage production?Are timing constraints different?What is the role of the PI in software development management?Are the skills for software and research the same?
c.f work of James Howison
Based on study done for Cameron Neylon’s Beyond Impact workshop
Is it more important to sustain the software that this workflow references, or the workflow itself?
At what level do you reference, at what level do you deposit?
Made more difficult than data because of the fluidly changing collaborative nature of software development – not just adding to the contributor pool
Based on OR2012 workshop outputs
Want to move towards OSI licenses which are similar in spirit to CC-BY e.g. BSD, Apache
C.f.5 Stars of Linked Data (Berners-Lee):Available w/ open license, machine-readable, non-proprietary format, open standards, linked to provide context 5 Stars of Online Journals (Shotton):Peer Review, Open Access, Enriched Content, Available Datasets, Machine-readable metadataWhat about community?
Where does it go from here? The role of software in digital repositories
www.software.ac.uk Where does it go from here?The Place of Software in Digital Repositories 12 July 2012 OR2012, Edinburgh Neil Chue Hong (@npch) N.ChueHong@software.ac.uk Software Sustainability Institute
Software is pervasive in research www.software.ac.uk Software Sustainability Institute
The Software Sustainability Institute www.software.ac.ukA national facility for building better software• Better software enables better research• Software reaches boundaries in its development cycle that prevent improvement, growth and adoption• Providing the expertise and services needed to negotiate to the next stage • Software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more… Supported by EPSRC Software Sustainability Institute Grant EP/H043160/1
Software Sustainability: preservation vs sustainability www.software.ac.uk Sustainability? Image courtesy of London Permaculture under CC-by-nc-sa licenseImage courtesy of Mortati under CC-by-nc-nd Preservation? Software Sustainability Institute
Why are you considering software sustainability? www.software.ac.uk Achieve legal compliance Create heritage valuePurpose Enable continued access to data Encourage software reuse JISC-funded, with Curtis+Cartwright http://www.software.ac.uk/resources/preserving-software-resources Software Sustainability Institute
How are you going to choose the right approach? www.software.ac.uk Preservation (techno-centric) Emulation (data-centric) Migration (functionality-centric) Approach Transition (process-centric) Hibernation (knowledge-centric) Deprecation Software Sustainability Institute
Software Carpentry www.software.ac.uk• Helping scientists be more productive by teaching them basic computing skills• How to use repositories properly is a key skill• http://software-carpentry.org Software Sustainability Institute
Just the Nature of the problem? www.software.ac.ukStatistics courtesy of Greg Wilson, Software Carpentry, from Nature article Maintenance is not fun Published online 13 October 2010 | Nature 467, 775-777 (2010) doi:10.1038/467775a Hacking is fun Software Sustainability Institute
www.software.ac.uk“Re-”is the new black Software Sustainability Institute
Slide from Carole Goble, JCDL 2012 Reuse Review New Refresh State Rerun Same State Good enough Repeat To Verify Reproduce with new DataData ReplayProvenance Repurpose Recover Reconstruct Repair Data Reproduce with new Method Public ation Method Method Method only Documentation Provenance Execution (link data and code)Drummond C Replicability is not Reproducibility: Nor is it Good Science, onlinePeng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
The most important: Reward www.software.ac.uk• How do we reward people for important software contributions?• Traditionally: publish a research paper that happens to mention software Can we provide more direct, acceptable software citations?• A Research Software Impact Manifesto http://www.software.ac.uk/blog/2011-05-02-publish-or-be- damned-alternative-impact-manifesto-research-software NB Authorship is hard Software Sustainability Institute
www.software.ac.ukIsn’t softwarejust data?http://beyond-impact.org/?p=175 Software Sustainability Institute
Boundary www.software.ac.ukWhat do we choose to keep:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies?What’s the minimum citable part? Software Sustainability Institute
Function Granularity www.software.ac.uk Library / Suite / Package AlgorithmProgram … Software Sustainability Institute
Why do we version? Versioning www.software.ac.uk- To indicate a change- To allow sharing- To confer special status Public Public Public v1 v2 v3 Personal Personal v3 v3a Personal Personal Personal v1 v2 v2a Personal v2a Software Sustainability Institute
www.software.ac.ukBackup,Sharing,Archiving Software Sustainability Institute
Differing roles, different repositories www.software.ac.ukbackup sharing archivingTimescales IngestPolicy MetadataLicensing Assurance Software Sustainability Institute
Software Metapapers www.software.ac.uk • Create a complete scholarly record including “standard” publication, method, dataset and models, and software e.g. modelling and simulation, statistical analysis Enable replay, reproduction and reuse • Pragmatic approach is to create a metadata record for the software, and link it to a copy of the software in some storage infrastructure This is a software metapaper Peer-review the metadata, not the software • Journal of Open Research Software: http://openresearchsoftware.metajnl.com/See: http://openresearchsoftware.metajnl.com/faq/ Software Sustainability Instituteand the work by B. Matthews et al: The Significant Properties of Software: A Study
An acceptable repository www.software.ac.uk• Metapaper references an instance of software, stored in a “suitable” repository Clear access / deposit / preservation policy Adherence to standards Ability to easily “transfer” Sustainability of hosting organisation Ability to monitor, check integrity (obsolescence?)• We may be storing Binaries, source code (as text or archived), virtual machines(!) Software Sustainability Institute
Potential for confusion www.software.ac.uk• ‘The right license for all parts of the scholarly record’ Victoria Stodden, Enabling Reproducible Research: Open Licensing for Scientific Innovation• Commonly used OSI approved licenses include: Apache License, 2.0 (Apache-2.0) BSD 3-Clause “New” or “Revised” license (BSD-3-Clause) BSD 3-Clause “Simplified” or “FreeBSD” license (BSD-2-Clause) GNU General Public License (GPL) GNU Library or “Lesser” General Public License (LGPL) MIT license (MIT) Mozilla Public License 2.0 (MPL-2.0) Common Development and Distribution License (CDDL-1.0) Eclipse Public License (EPL-1.0)• Does enabling the deposit of software just confuse those already depositing publications/data? Software Sustainability Institute
5 Stars of Software? www.software.ac.uk• Do we need a 5 stars for software? Existence – there is accurate metadata that defines the software Availability – you can access and run the software Openness – the software has an open permissible license Assured – the software provides ways of assuring its correctness Linked – the related data, c.f. 5 Stars of Linked Data dependencies and papers are (Berners-Lee) indicated 5 Stars of Online Journals (Shotton) Software Sustainability Institute
Take home points www.software.ac.uk1) Researchers are developing more softwarethan ever, and trying to do it better2) They want to be rewarded for creating acomplete scholarly record – this includessoftware3) We still don’t know the best way to shiftfrom one repository role to another when itcomes to software! BackupSoftware Sustainability Institutearchiving -> sharing ->