Doing Science Properly In The Digital Age - Rutgers Seminar


Published on

Seminar given at Rutgers University on 2nd October 2012.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • For thousands of years, research was empirical, using observation and experiment to describe natural phenomenaIn the last few hundred years, theory developed using models and generalisationsIn the last decades, computational simulation has made it possible to model complex phenomenaIn the last few years, data exploration – digital research – has unified experimental data, theory, and computational simulation to analyse the vast amounts of collected and generated information
  • Images courtesy of projects from the ENGAGE programme
  • Statistics from Greg WilsonAre academics software developers?Can research consortia manage production?Are timing constraints different?What is the role of the PI in software development management?Are the skills for software and research the same?- more and more researchers use computer software and hardware intheir day to day research, not just those researchers who could beclassed as being computational scientists, yet they find itincreasingly difficult to exploit due to a lack of coordination([Gob10], also observed in [Han09])- there is a wide variance in the levels of experience in scientificcomputing and software development, and hence their use of computing,which is present across all domains and levels of seniority ([Har09],also ongoing as our result with the DIRAC consortium)- software is often treated as if it was disposable, rather than thesubject of a £9m per year investment by EPSRC [SaaI12]
  • Software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more…Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchersProviding services for research software users and developersDeveloping research community interactions and capacityPromoting research software best practice and capability
  • Transferring software knowledge is not easy fused pairs of different MR sequences modulated in red-green colour space which enhances tissue discrimination
  • Transferring software knowledge is not easy fused pairs of different MR sequences modulated in red-green colour space which enhances tissue discrimination
  • Collaboration helps sustainability
  • Collaboration helps sustainability
  • Update slide for surveymapper?
  • Update slide for surveymapper?
  • CPD?
  • Ultimately the Software Sustainability Institute would like to seebasic scientific computing to be taught in the same way thatstatistics are a fundamental part of any researchers toolbox. Likewisean understanding of software programming should be seen as equivalentto the understanding of presenting and disseminating your work whichis expected of graduates.A basic syllabus and list of recognised teaching providers ensuresthere is a way of providing excellent foundation training inscientific computing via the CDTs. Specialist interdisciplinaryscientific computing CDTs which concentrate on instilling the bestcomputational, data analysis and software development techniques intheir doctoral students will provide the UK with the next generationof world-class scientists.
  • Being submitted to PNAS
  • c.f work of James Howison
  • C.f.5 Stars of Linked Data (Berners-Lee):Available w/ open license, machine-readable, non-proprietary format, open standards, linked to provide context 5 Stars of Online Journals (Shotton):Peer Review, Open Access, Enriched Content, Available Datasets, Machine-readable metadataWhat about community?
  • Become our next collaborator – email
  • Doing Science Properly In The Digital Age - Rutgers Seminar

    1. 1. ScienceProperly in theDigital Age2 October 2012, Rutgers UniversityNeil Chue Hong (@npch) Software Sustainability Institute
    2. 2. Four Paradigms of Research Software Sustainability Institute
    3. 3. Software is pervasive in research Software Sustainability Institute
    4. 4. Just the Nature of the problem? courtesy of Jo Hannay et al, “How Do Scientists Develop and Use Scientific Software? Maintenance is not fun Published online 13 October 2010 | Nature 467, 775-777 Hacking new stuff is fun (2010) doi:10.1038/467775a Software Sustainability Institute
    5. 5. The Software Sustainability Institute national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its development cycle that prevent improvement, growth and adoption• Providing the expertise and services needed to negotiate to the next stage• Developing the policy and tools to support the community developing and using research software Supported by EPSRC Software Sustainability Institute Grant EP/H043160/1
    6. 6. UK Research Computing Ecosystem PeopleComputing Software Communities Data Centres … Network/Collaboration Instruments Software Sustainability Institute
    7. 7. SSI Organisation• Community Engagement (Shoaib Sufi)  Fellowship Programme• Consultancy (Steve Crouch)  Open Call for Projects  Software Evaluation• Policy (Simon Hettrick)  Guides and Case Studies• Training (Mike Jackson)  Software Carpentry  Software Surgeries• Collaboration between universities of Edinburgh, Manchester, Oxford and Southampton Software Sustainability Institute
    8. 8. Case Study: Ligand Binding• Centre for Computational Chemistry, Bristol  New methods for rapid MC sampling of biomolecular systems modelled using QM/MM  Developed two codes ProtoMS (F77) + Sire (C++)  Water-Swap Reaction Coordinate method to calculate absolute protein-ligand binding free energies• SSI’s work is helping to scale development  ProtoMS and Sire both single developer codes  ASPIRE/ACQUIRE framework has multiple devs • Split architecture between ASPIRE (adaptive multiresolution hybrid MD simulation) and ACQUIRE (WorkPacket scheduling system with optimisation for time to result vs “green-ness”• Software Sustainability Institute
    9. 9. Case Study: Brain Imaging• Brain Research Imaging Centre, Edinburgh  Develop PrivacyGuard software, a DICOM image deidentification toolkit  Created software to support new multispectral colouring modulation and variance identification technique (“MCMxxxVI”) to identify white matter lesions that are indicative of declining cognitive ability  BRIC are not principally software developers, but do provide software to other researchers• SSI’s work means the software has been reviewed and refactored  Looked at exploitation • Usability review, Naming/trademark review  Made it easier for BRIC staff to maintain and develop • Move to standard repositories, testing and documentation processes • Examination of licencing for MCMxxxVI • Extraction and refactoring to create standalone tools•• Software Sustainability Institute
    10. 10. Case Study: Climate Policy Modelling• CIAS team at Tyndall Centre for Climate Change Research, University of East Anglia  Develop linked climate and economic models for detailed analysis  Their software was not ready to be used by other groups • One researcher/developer at UEA, several users• SSI’s work means the software is robust enough that it can be installed and used by others  Enabled use of the software by the WWFN’sClimascope project and James Cook University • Documented software to allow extensions by contributors • Made it easier to maintain and backup • Added job scheduling to improve modeling throughput • New modelling framework enables new models i.e. new science• Software Sustainability Institute
    11. 11. Case Study: textual studies• TextVRE team at CeRCH, Kings College London  Developed an environment which is used to integrate various tools used in the e-Humanities textual studies lifecycle  Builds on the German TextGrid project, and many other existing tools• SSI’s work means the software is can be run “out of the box” – an important requirement for the researchers  Developed a VM image containing the TextVRE installation • Improve installation instructions • Develop tests to check each installed component • Improve modularisation to allow others to contribute and maintain  Feeding back work to TextGrid• Software Sustainability Institute
    12. 12. The modern researcher… • … worries about:  Data management and analysis  Reproducible research  Scalable simulations  Integration of models and workflowsPicture of Otto Stern of  CollaborationEmilio Segre Visual Archives Software Sustainability Institute
    13. 13. Observation 1:Software is acrossresearchCorollary: software is bleeding edge and long-tailDemanding users are coming from arts + humanities,economics, and social science as well as sciences Software Sustainability Institute
    14. 14. Observation 2:A culture of re-use than re-invention is notwidespreadCorollary: we have wasted effort and increased siloing Software Sustainability Institute
    15. 15. Observation 3: people are“embarrassed”about softwareCorollary: something is broken in the way we regard,recognise and reward software Software Sustainability Institute
    16. 16. SSI Drivers and Themes• Two key drivers which cause people to seek the SSI’s advice:  They want to be more productive in their research  They don’t want to be embarrassed by appearing worse than their peers• Broadly, our work falls into a few key themes:  The role and reward of software in research  Recognition of software career paths  Developing the scientific computing / software development skill base Software Sustainability Institute
    17. 17. The Foundations of Digital Research Re- Re-usable search Re-producible Software Careers Software Recognition / RewardSoftware Skills and Capability Software Sustainability Institute
    18. 18. Gap 1: Software Skills Training Research Software Summer Focussed Schools Carpentry (methods) Who fills this gap? HPC Short Courses MSc in HPC / scientific computing Advanced HPC TrainingProgramming Focussed Programming Programming (Tools) 101 201 Basic Advanced Software Sustainability Institute
    19. 19. Software philosophy as part of the process• Foundations of scientific computing in undergraduate courses  Like presentation skills• Methods of scientific computing in postgraduate courses  Like statistics and ethics• Show the benefits from the knowledge and methods of digital research  Not just programming 101 Software Sustainability Institute
    20. 20. Best Practices for Scientific Computing Write programs for people, not computers2. Automate repetitive tasks3. Use the computer to record history4. Make incremental changes5. Use version control6. Don’t repeat yourself (or others)7. Plan for mistakes8. Optimise software only after it works correctly9. Document the design and purpose of the code, rather than its mechanics10. Conduct code reviewsPaper (including the evidence) being submitted to arXiv and PNAS Software Sustainability Institute
    21. 21. Gap 2: Lack of recognition and reward• There is an anachronism in the way we conduct and recognise research?  REF references software as an output but it is still not easy to get recognition – peer review fails• Software careers  Researchers who use software  Researcher-Developers  Research Software Engineers  Research Software Support  Research Systems Providers Software Sustainability Institute
    22. 22. No recognition without reward, no reward without reproducibility?• How do we reward people for important software contributions?• Traditionally: publish a research paper that happens to mention software  Can we provide more direct, acceptable software citations?• A Research Software Impact Manifesto  alternative-impact-manifesto-research-software  NB Authorship is hard• It works for data!  C.f. Heather Piowowar’s work  000308 Software Sustainability Institute
    23. 23. Software Metapapers • Create a complete scholarly record including “standard” publication, method, dataset and models, and software  e.g. modelling and simulation, statistical analysis  Enable replay, reproduction and reuse • Pragmatic approach is to create a metadata record for the software, and link it to a copy of the software in some storage infrastructure  This is a software metapaper  Peer-review the metadata, not the software • Journal of Open Research Software:  Software Sustainability Instituteand the work by B. Matthews et al: The Significant Properties of Software: A Study
    24. 24. Gap 3: Lack of support infrastructure• For example: no digital repository which satisfies the criteria:  Open to anyone in the UK to archive software  Software associated with an OSI license  Provide a unique, permanent identifier  Publishes a preservation/curation/sustainability plan• This is just deposit, not even preservation or sustainability Software Sustainability Institute
    25. 25. 5 Stars of Software?• Do we need a 5 stars for software?  Existence – there is accurate metadata that defines the software  Availability – you can access and run the software  Openness – the software has an open permissible license  Assured – the software provides ways of assuring its correctness c.f. 5 Stars of Linked Data  Linked – the related data, (Berners-Lee) dependencies and papers are 5 Stars of Online Journals (Shotton) indicated Software Sustainability Institute
    26. 26. Gap 4: Software Maturity and Management Not all software should make it to the next stageSoftware proliferation Management changes through time, requiring planning Innovation Consolidation Customisation Time Software Sustainability Institute
    27. 27. A More Manageable Ecosystem• Discourage duplicative software development in research grants by rewarding reuse and long-term development  Need to change perceptions so that software is seen as valuable  But understand when it should not proceed to next stage• Different stages should be managed and funded separately  Maintenance vs. research vs. development• A skilled researcher base is the key in the digital age  Create a larger proportion of enabled researchers and provide the ramps to go from desktop to high-end infrastructure  Allow and encourage specialism and collaboration Software Sustainability Institute
    28. 28. Take home points Researchers are developing more softwarethan ever, and trying to do it better2) We are not adequately providing thetraining, recognition and reward, and careerpaths to enable a step change improvementin research software3) This is hindering digital research4) The only people who can change thissituation are peopleSustainability Institute Software like you!
    29. 29. A national facility for cultivating world-class research through software current collaborationsBecome our next collaborators!Website: Software Sustainability Institute