Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EclipseCon France 2015 - Science Track


Published on

Software is increasingly playing a big part in scientific research, but in most cases the growth is organic. The life time of research software is often as short as the duration of a postdoctoral contract: Once the researcher moves on, custom-written niche code is frequently not well documented, components are not reusable, and the overall development effort is likely lost.

This is a case study in looking at the evolution of software for research in the field of genomics within my research group at the Department of Genetics at Cambridge University. While our research questions changed over the past decade, we moved from Perl code and regular expressions to R and statistical analysis, and from there to agent-based simulations in Java. Not only will I discuss the languages and tools used as well as the processes and how they have evolved over the years. It also covers the factors that influence the nature of the growth, such as funding, but also how 'open source' as a default has changed our development work. We also take a look into the future to see how we predict the software usage will grow.

Also, in presenting the problems and discussing possible solution, this talk will look at the role institutions play in helping address these issues. In particular the Software Sustainability Institute (SSI, works in the UK to promote the development, maintenance and (re)use of research software.

The Eclipse Foundation, with the Science Working Group, works to facilitate software sharing and reuse. How can organisations like the SSI and Eclipse align their strategies and activities for maximum effect?

Published in: Science
  • ⇒ ⇐ This service will write as best as they can. So you do not need to waste the time on rewritings.
    Are you sure you want to  Yes  No
    Your message goes here
  • You might get some help from ⇒ ⇐ Success and best regards!
    Are you sure you want to  Yes  No
    Your message goes here

EclipseCon France 2015 - Science Track

  1. 1. Better Software, Better Research Dr. Boris Adryan @BorisAdryan   @SoftwareSaved Software   Sustainability   Institute
  2. 2. brief bio & experience since 2015 Fellow of the SSI since 2013 IoT entrepreneur 2008-2016 Royal Society research group leader at University of Cambridge 2011-2015 Scientific advisor to FlyBase 2012-2015 MPhil Director for Computational Biology
  3. 3. ‣ a UK government-funded “virtual institute” for building better, sustainable software ‣ primarily focussed on academic software but very inclusive to industry partners ‣ distributed team with a few members at universities in Southampton, Oxford, Manchester and Edinburgh plus a vast network of independent fellows “in the field” Software   Sustainability   Institute   @SoftwareSaved
  4. 4. software ‣ good, reusable code ‣ well documented people ‣ recognition and reward ‣ career paths values ‣ reproducibility ‣ openness policy ‣ raise awareness ‣ establish facts Software   Sustainability   Institute   @SoftwareSaved
  5. 5. Survey results: without-software-say-7-out-10-uk-researchers Software   Sustainability   Institute yes 92% no 8% yes 56% no 44% yes 79% no 21% no difference 10% not be practical 21% more effort 69% do you use research software? do you develop research software? have you received training in software development? impact of not having research software
  6. 6. ‣ Software reaches boundaries that prevent improvement, growth and adoption ‣ Providing the expertise and services needed to negotiate to the next stage: ✓ software reviews and refactoring ✓ collaborations between stakeholders (Hi, Eclipse!) ✓ guidance and best practice on software development ✓ training (e.g. Software Carpentry) ✓ project management ✓ community building ✓ publicity etc… Software   Sustainability   Institute   @SoftwareSaved
  7. 7. Software   Sustainability   Institute Work better. Together.
  8. 8. Issues with research software Exemplified by the honest account and anecdotes of ‣bad coding, ‣bad design decisions, and ‣bad practice of a humble biologist.
  9. 9. coding skills school Turbo Pascal Turbo Prolog independent developer Borland Delphi undergraduate and PhD student postdoc Perl, R, SQL hobbyist + entrepreneur Python, C, node.js, Clojure, noSQL CTRL+F1 1992 1995 2005 -present 2010 -present
  10. 10. ‣unsupervised undergraduate project ‣inspired by the need of a PhD student ‣no software manual or help ‣requests for code: 0 ‣URL is long dead, no idea about the whereabouts of code very generous for the time!
  11. 11. ‣addressed my own needs as biologist (“got the job done”) ‣horribly mix of object oriented and spaghetti code ‣required complex manipulations in the source to update quickly outdated information ‣requests for code: many; but too embarrassed to put on sourceforge “If you would like to adapt GO-Cluster to your personal needs and want the source code (only fairly commented), please contact my group leader Dr. Reinhard Schuh.”
  12. 12. ‣there’s virtually no Objective C adoption in the scientific community
  13. 13. BAD SCIENCE“All other data analyses were performed using custom-written Perl scripts or publicly available websites.“ “All downstream analyses were performed with custom-made Perl scripts.” “All data analysis was performed using custom-written Perl scripts and statistical tests were performed with R.” Embarrassingly unscientific quotes from a few of my data analytical papers between 2005-2008 i.e.: “f$@k you, I can’t be asked telling you what I did!” in combination with mostly uncommented write-only and execute-once type scripts
  14. 14. OPEN DATA, OPEN SOURCE, OPEN ACCESS, OPEN SCIENCE since early 2010s: increased pressure in the community not only to release data, but also tools ‣sometimes requested by journals ‣often required to appease reviewers ‣frequent naming and shaming on Twitter
  15. 15. simple Perl CGI script with MySQL backend ‣easy to update content :-) ‣no analytical capability :-( using InterMine framework, based on Java, ASP, Ajax and PostgreSQL ‣fancy features and looks :-) ‣requires a specialists to do any update :-( FlyTF is a gold standard, but has never been funded! Technical upgrade (feature-rrhea) was motivated because content- only updates are hard to publish.
  16. 16. ‣Java ‣hardware- and OS-independent ‣GUI and config files ‣extensive documentation for end-users and programmers ‣code refactored regularly to ease readability for novices ‣all source on Github
  17. 17. Issues with (academic) software development ‣ typically little or no dedicated budget for software development on scientific grants ‣ even if funded, resources are often too little to adhere to best practices (e.g. lack of a planning phase) ‣ often very ad-hoc with a focus on getting ‘one job done’, not with reuse and sustainability in hindsight ‣ there’s no credit for writing good software ‣ code generated by ‘amateurs’ with a high turnover of people with skills ‣ academic salaries are poor compared to industry salaries - it’s hard to get professional software developers
  18. 18. Software   Sustainability   Institute Work better. Together. This presentation is on Slideshare: For the community. Driven by individuals. Us.