This talk was given by Jonathan Peirceas part of the Open Science Symposium at ECVP 2012, organised by Lee de Wit. It may be distributed freely under creative commons (CC BY 3.0) http://creativecommons.org/licenses/by/3.0/
See also Stodden (2009) The Legal Framework for Reproducible Scientific Research. IEEE Computing in Science & engineeringReproducible conditions:The full compendium is available on the InternetThe media components, including the original selection and arrangement of the data, are licensed under CC BY or released to the public domain under CC0The code components are licensed under one of Apache 2.0, the MIT License, or the Modified BSD license, or released to the public domain under CC0The data have been released into the public d. omain according to the Science Commons Open Data Protocol.
Open experiments and open-source
OPEN EXPERIMENTS AND OPENSOURCE SOFTWAREJonathan PeirceUniversity of Nottingham
OPEN-SOURCE SOFTWARE Is free Is often more feature-rich/advanced Allows us to examine/change all the code Buggy? Young packages can be, but developers are usually very responsive to fixing bugs Mature packages aren’t (e.g. see Firefox, Thunderbird, GIMP, Linux, Python…) Unsustainable? Not once they reach critical mass Open-source software is good for science What about open-sourcing experiments?
A REPOSITORY OF OPEN-EXPERIMENTS? Goals: Reproducibility: Rather than running your interpretation of a study from its methods section, fetch the actual experiment Publicity: draw attention to your experiment as people browse the repository Education: a starting point to build an experiment for new users of a piece of software One-stop location to up/download entire experiments or components of them Platform/package independent (PsychoPy, PTB, Presentation…) Easy to upload, easy to browse
SIMILAR REPOSITORIES MatlabCentral File Exchange (proprietary) Figshare.com (for data)
SIMILAR REPOSITORIES MatlabCentral File Exchange (proprietary) Figshare.com (for data) Viperlib (for demos) RunMyCode.org (computational economics) Create a Companion website for your paper The model code can be run directly on the site(!)
SIMILAR REPOSITORIES MatlabCentral File Exchange (proprietary) Figshare.com (for data) Viperlib (for demos) RunMyCode.org (computational economics) OpenScienceFramework.org (new) "The Open Science Framework (OSF) is an infrastructure for documenting, archiving, logging, sharing, and registering scientific projects. Tools are being designed to integrate open practices with a scientists daily workflow rather than appending them ex post facto." See the OSF goals here: http://openscienceframework.org/project/4znZP/wiki/home
POTENTIAL CONCERNS People don’t want others to see their code People might run studies that they didnt actually understand Errors in studies might propagate more Why should someone else benefit from the hours I spent coding that experiment/stimulus? We don’t need this resource; we can make web pages and use code repositories (e.g. github) People will never use such a resource Someone will have to set it up and run it
PEOPLE DON’T WANT OTHERS TO SEE THEIRCODE Why not? Most people write code for themselves, not for others to see Cleaning/documenting your code takes time Maybe you’re a little worried about someone finding a bug in your code? On the other hand Writing neat, clear code is good; it means fewer bugs and more-reusable code for yourself! Although we don’t like people finding our bugs, it is actually a good thing for science Some tools provide graphical interfaces which should reduce the anxiety
PEOPLE MIGHT RUN STUDIES THAT THEYDIDNT UNDERSTAND How? They might not realise some critical part of the setup (e.g a calibrated monitor) They might make an inappropriate change or use settings that arent possible On the other hand Should we really be setting programming ability as a hurdle to running studies? Providing the base code (and some notes including some of the caveats) will reduce this problem Maybe the resource should point out that code does not replace the need for good supervision/education
ERRORS IN STUDIES MIGHT PROPAGATE MORE How? If a study contains a bug in code, and is re-used by another lab, the bug will tend to remain. If they re-wrote the code from scratch it would be gone On the other hand In reality, if the latter study finds a different result to the former, it just fails to get published because we dont know why the 2 studies differ. No advantage. If there were a bug and the code were available we would stand some chance of finding
WHY SHOULD SOMEONE ELSE BENEFIT? Youve put a lot of effort into your building study Why should someone else just download it and use it for free?! Let them think of their own study! On the other hand; (Thank goodness the open-source developers dont think like that!) You would get to benefit from other peoples work. Science benefits You should want people to build on your studies. That is in your interest
WE DON’T NEED THIS RESOURCE Why not? We could use code repositories (e.g. sourceforge, github etc) or our institutional websites But recall the goals: Replicability Publicity Education Open-source repositories are mostly designed for technically very literate, which limits the contributors
PEOPLE WILL NEVER USE SUCH A RESOURCE Really? Lots of do-gooders have set up data repositories, but theyre empty OK, so how would we get people to use an open- science repository? Encourage people that it really is good for them if people can extend their study easily Make it compulsory (e.g. via the journals)?
PEOPLE WILL NEVER USE SUCH A RESOURCE Really?: Lots of do-gooders have set up data repositories, but theyre empty OK, so how would we get people to use an open- science repository? Encourage people that it really is good for them if people can extend their study easily Make it compulsory (e.g. via the journals)? Provide a kite-mark, via the journals, for articles that can be fully replicated [since giving the talk I have discovered that kite-mark is a purely British concept. It refers to a non-compulsory badge, from the British Standards Institute, showing that a product meats high quality standards]
REPRODUCIBLE RESEARCH STANDARD Stodden (2009) Enabling reproducible research: licensing scientific innovation. International Journal of Communications Law and Policy Potentially different levels of compliance with the standard: Verified: has already been verified in an independent lab Verifiable: the compendium (full set of research materials) is available to fully reproduce the study Semi-verifiable: not all materials have been released but the description of the work should allow replication Non-verifiable: the work requires materials or apparatus that are not typically available “Efforts are currently under way for the RRS to be an official mark of Science Commons. This would provide an easily identifiable logo and a clear definition for each level of reproducibility.”
IT WILL TAKE TIME AND EFFORT TO IMPLEMENT There will be some development time to building a site There might be further time needed to manage/screen the contributions (Im too busy with PsychoPy) On the other hand; There are open-source tools already available to build academic repositories We might be able to piggy-back on another site Maybe the Open Science Framework will do all we want
SUMMARY Open-source software has improved scientific Productivity Open-source experiments could improve scientific Reproducability Education Productivity But we need; buy-in from the scientists (and possibly the journals) user-friendly resources