Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating Executable Research Compendia to Improve Reproducibility in the Geosciences

80 views

Published on

The project Opening Reproducible Research (http://o2r.info) tries to reduce the barrier to reproducible research by developing a convention and supporting tools for Executable Research Compendia (ERCs, https://doi.org/10.1045/january2017-nuest) which include (i) the article, (ii) data, (iii) code, and (iv) the runtime environment to reproduce the study. The ERC provides a well-structured container for both the needs of journals (ERC as the item under review), archives (suitable metadata and packaging formats), and researchers (literally everything needed to re-do an analysis is there). It relies on Docker to define and store the runtime environment. ERCs should be simple enough to be created manually and absorb best practices for organizing digital workspaces. Complementary, an online creation service automatically creates ERC, including Dockerfile and Docker image, from typical user workspaces for less experienced users. A validation and manipulation service will allow (a) users to create an ERC for their workflows with minimal required input, (b) users to interact with published ERC, e.g. (peer) review the contents, or manipulate parameters of the workflow and explore interactive graphics, and (c) platform providers (e.g. journals, data repositories, archives, universities) to integrate o2r building blocks to expand their procedures with exectuable containers. The reference implementation focuses on the geoscience domain and the R language.

We show which steps and aspects of publishing and properly archiving computational research with containers can or cannot be automated for a specific community of practice, and point to future challenges. We will share the concepts behind the ERC (http://o2r.info/erc-spec) and the state of the o2r architecture (http://o2r.info/architecture) and software (https://github.com/o2r-project).

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Creating Executable Research Compendia to Improve Reproducibility in the Geosciences

  1. 1. Creating Executable Research Compendia to Improve Reproducibility in the Geosciences Daniel Nüst | University of Münster | @nordholmen C4RR workshop, June 28 2017, Cambridge, UK
  2. 2. Contents Creating ERCs Technical background A world with ERCs 2
  3. 3. Executable Research Compendium 3
  4. 4. 4
  5. 5. Key features of ERCs Nested containers (BagIt, Docker) Librarian-ready Reproducibility range of 5 to 10 years (still worth integrating, target users are not science historians) Desktop-size data and algorithms - closed and complete “Geo-stuff” and R for the “last 10 %” Remain understandable for scientists 5
  6. 6. Creating & Inspecting ERC How far can we reduce overhead for scientists? 6
  7. 7. 7
  8. 8. 8
  9. 9. ERC creation process ❏ Submit workspace (“Scripters”/”Coders”) ❏ Extract metadata ❏ Execute analysis ❏ Check syntax ❏ Capture runtime environment (manifest + image) ❏ Check metadata (user!) 9
  10. 10. containerit https://github.com/o2r-project/containerit 10
  11. 11. containerit (cont.) 11 ...
  12. 12. meta toolsuite - extract - map - harvest - validate Highlights Automatically extract several metadata from workspace, including spatial information Facilitate MD management with schema translation maps 12
  13. 13. 13
  14. 14. 14https://sandbox.zenodo.org/communities/o2r
  15. 15. Technical Background 15
  16. 16. ERC specification GitHub dev Development steps version 0, practical evaluation version 0.5, expert evaluation version 0.6, architect evaluation version 1 (mid 2017) > ref. impl. Content http://o2r.info/erc-spec 16
  17. 17. ERC specification - key features & structure base directory main document & display file runtime image & runtime manifest yml configuration file (control statements, metadata) 5 files + x 17
  18. 18. http://o2r.info/architecture/ 18 Architecture for ERC-based publication process
  19. 19. A world with ERCs 19
  20. 20. 20 Manipulate, Validate, Interact, ...
  21. 21. Integration Hacks 21 Chrome Extension
  22. 22. Geocontainer labels study project 22 Badges API Chrome Extension
  23. 23. Summary Executable Research Compendia are fun and … help us learn a lot about reproducibility work including a domain-specific “last mile” take into consideration requirements of libraries and preservation re-use and integrate, are not “a platform” dont’t solve all problems (R, geo, 1/5 Vs, no HPC, comp. reproducibility) Reproducibility service makes ERC work in geosciences for the current publication workflow and services. 23
  24. 24. Outlook “A lot of glue work around the edges” (M.Hartley) ERCs are post-hoc glue for minimal reproducibility Catching up with reference implementation and demo ERCs Spin-out of tools Follow-ups & collaborations (production mode in cloud? special issues?) 24
  25. 25. Unconcealed ad: Reproducible GEOBIA doi: 10.3390/rs9030290 http://www.mdpi.com/2072-4292/9/3/290 25
  26. 26. Unconcealed ad II: Docker for RR lesson https://github.com/ nuest/ docker-reproducible -research https://nuest.github.io/ docker-reproducible -research 26
  27. 27. Thanks! What are your questions? 27 @o2r_project github.com/o2r-project o2r.info

×