Executable Papers: publishing science that works Anita de Waard, Elsevier LabsHCLS Scientiﬁc Discourse Group June 20, 2011
Elsevier ChallengesGoals:- Invite and survey ideas in innovative science publishing- Create a community of people working on similar issues, from different backgrounds/viewpointsRules:- Open submission; very interdisciplinary panel of judges; open publication of submissions- IPR stays with author; if commercial development, Elsevier has right of ﬁrst refusalChallenges so far:- 2008/9: Elsevier Grand Challenge for knowledge enhancement in the life sciences: http://www.elseviergrandchallenge.com- 2010/11: ISMB Killer App award: rewarding bioinformatics apps that work for biologists http://killerapp.iscb.org/- 2011: Elsevier Executable Paper challenge: http://www.executablepapers.com/
Executable Paper ChallengeDriven by issues in publishing computational science:- How can we develop a model for executable ﬁles that is compatible with the user’s operating system and architecture and adaptable to future systems?- How do we manage very large ﬁle sizes?- How do we validate data and code, and decrease the reviewer’s workload?- How to support registering and tracking of actions taken on the ‘executable paper?’Coorganised with International Conf on Computational Sci http://www.iccs-meeting.org:- For high-performance and (geo/eco/bio/chem)‘-informatics’ ﬁelds- Actually challenge participants were a different community!
The Finalists: http://www.executablepapers.com/ﬁnalists.html1.SHARE - a web portal for creating and sharing executable research papers http://sites.google.com/site/executablepaper/2.A data and code model for reproducible research and executable papers http:// dirac.cnrs-orleans.fr/~hinsen/executable_paper_challenge.tar.gz3.A-R-E: The Author-Review-Execute environment http://iwb.ﬂuidops.com:7878/resource/AREpaper4.Planetary System: Web 3.0 and Active Documents https://trac.mathweb.org/planetary/wiki/EPCDemo5.Paper Mache: Creating Dynamic Reproducible Science http://oware.cse.tamu.edu:8080/6.A Provenance Based Infrastructure for Creating Executable Papers http://www.vistrails.org/index.php/ExecutablePapers7.Universal Identiﬁer for Computation Results http://vcr.stanford.edu8.R2 Platform for Reproducible Research http://rsquared.stat.uni-muenchen.de/9.The Collage Authoring Environment http://collage.cyfronet.pl
SHARE - a web portal for creating and sharing executable research papers http://sites.google.com/site/executablepaper/- built to house the submissions to the Transformation Tool Contest (TTC)- an environment where all software and data related to the paper is optimally installed and ready for (temporary and secure) evaluation- a speciﬁc virtual machine image can be instantiated within the paper- SHARE supports multiple operating systems both at the level of the remote virtual machines as well as at the level of the connecting clients running on the user’s machine- more than 100 heterogenous images have been contributed by different research communities so far
A-R-E: The Author-Review-Execute environmenthttp://iwb.ﬂuidops.com:7878/resource/AREpaper- A data-driven, loosely coupled, and distributed approach to support the life cycle of an (executable) paper: authoring, reviewing, publication and study: - ﬁnding out which paragraph is providing the information bit pertinent to the reference - navigate from data points in a plot to the data items in raw experimental data that led to these points (e.g. point to an excel sheet column with experimental data) - navigate into the program code that led to a speciﬁc data set- Based on a semantic wiki:
A Provenance Based Infrastructure for Creating Executable Papers http://www.vistrails.org/index.php/ExecutablePapers- VisTrails provides a mechanism to store provenance for workﬂows- Code and plug-ins for LaTeX, Wiki, Microsoft Word, and PowerPoint- CrowdLabs (http://www.crowdlabs.org) to allow papers to point to results that can be executed on a remote server and interactively explored from a Web browser
Universal Identiﬁer for Computation Results http://vcr.stanford.edu- Veriﬁable Computational Result (VCR): A computational result (eg. table, ﬁgure, chart, dataset), together with the metadata describing in detail the computations that created it every computation automatically generates a detailed chronicle of its inputs and outputs as part of the process execution. The chronicle is automatically stored in a standard format on a VCR repository for later access- Veriﬁable Result Repository (Repository): A web- services provider that archives VCRs and later serves up views of speciﬁc computational results- Veriﬁable Result Identiﬁer (VRI): A URL (web address) that universally and permanently identiﬁes a repository and causes it to serve up views of a speciﬁc VCRa DOI-like string that permanently and uniquely identiﬁes the chronicle associated to that result and the repository that can serve views of that chronicle.
The Collage Authoring Environment http://collage.cyfronet.pl- environment which enables authors to seamlessly embed chunks of executable code (called assets) into scientiﬁc publications: - input forms: used by the user to feed input data into the running experiment - visualizations: render an experiment result which can be directly visualized in the research paper - code snippets: embed an editable view of the code which enacts a speciﬁc computation and may be used to generate additional assets- allow repeated execution of these assets on underlying computing and data storage resources:
Next step: The Executable Journal?- Ideally, we’d like all these tools to work together- In fact, we’d like that to be how we communicate informatics/ computer science!- Submit a paper with a piece of working code- The code works on the platform- The code stays on the platform, and is available for other papers to run on, too!- Advantages: - Clearer communication of software - Less reinvention of the wheel - More collaboration
In other words:“I like the idea of [...] a research object corresponding to aPhD thesis sitting on the (digital) library shelf and then beingre-executed as new data comes along. So the thesis sitsthere and new results (or papers, or research objects) popout. I like this example because it involves tying down themethod and letting the data ﬂow, instead of the widely heldview that the data sits there and methods are applied to it.[...]These papers then become a way of distributing data andmethods in a highly usable and user-centric way [...]. Soscientists dont need to download and install tools and learnuser interfaces.They just interact with the publishedexecutable papers...” Dave De Roure, email to Wf4ever group
What does this have to do with HCLS?- Might be a good area to explore this in?- E.g. interchange of annotations that we are exploring w/Tim Clark’s group...- Next step: - Funding? - Format? - Platform?- Thoughts??