A Guide for Reproducible Research

www.postersession.com
Reproducibility in research is the ability to replicate the ultimate product of academic
research to reproduce the results and build on the research. The main entities of academic
research are data, scripts/software for processing and analysis, workflow of the research
process, and research output (Figure 1). Documenting workflow, data, and code during the
active phase of the scientific research is important for communication of the scholarship and
replication of the results. When researchers submit scientific papers or build on their work,
they face the challenge of having to remember all the details of their own work if they
haven't included well documentation for this work. In order to sustain and ensure the
integrity of reproducibility in the scientific research and advance the scientific research
process, this poster presents guidelines for researchers that help them to manage the
research entities during the active phase of the research process.
A Guide for Reproducible Research
Yasmin AlNoamany
University of California, Berkeley
yasminal@berkeley.edu
Introduction
The main entities of the scientific research
Research Software – source code or executables that researchers generate or integrate
into the workflow of the scientific research.
What to document:
Good practices in managing your software:
•  Custom scripts to automate research analysis.
•  Attach examples of how the code works.
•  Generate a list of all scripts, how to run them, and in what order.
•  Use tools that capture the experimental environment, such as Docker and ReproZIP.
•  Use metadata standards for each generated module. Each module should have at least
the following:
Ø  Name of the module
Ø  Name of the project
Ø  Name of Author
Ø  Input and Output
Ø  Purpose of the Module
Ø  A brief Description
Naming files should be descriptive and consistent!
Tools
•  Docker
•  Apache Ivy
Research Software
•  The experimental environment – e.g.,
hardware, operating system
•  The computing platform and
prerequisites
•  Scripts and libraries
•  Input and output parameters
•  The functionality of each script
•  Dependencies of the software
indicating versions
•  The structure of the code/software and
details about individual components
Scientific paper(s) along with graphs/tables – document(s) that contains the results of
the scientific research as well as all the assorted graphs and tables. This could be:
•  Compiled files (e.g., pdf)
•  Source files (e.g., .tex files, figures, .bib file)
•  Packages/libraries/styles installed (e.g., graphics)
•  Graphs and tables
Good practices in managing output files:
•  Document the environment and the file structure.
•  Track versions of produced papers, graphs, etc.
•  Document any problem that faces you with the computing environment.
•  Backup your files every while.
•  Save your files on Dropbox or any other cloud storage to keep track of your
versions.
•  For writing your manuscript, use Latex and Bibtex for these reasons:
Ø  Latex is free and open source.
Ø  A .tex file can be edited in any text editor.
Ø  The content is separated from style.
Ø  With a couple of line and style files, you can convert how your pdf looks.
Ø  Latex allows preserving your files longer time.
Ø  The output document looks better.
Tools
•  Latex
•  Bibtex
Research Output
Data
Data – files that were used or produced during the scientific research process. These files
can be raw data or different versions of processed data.
Good practices in managing data:
•  Include a README file in the directory that has the data.
•  Write a data management plan, which has become a requirement by funding agencies.
•  Provide a detailed description of the data, data source(s), and how it will be used.
•  Provide a description to the process of capturing the data.
•  Describe all the steps of data preprocessing.
•  Provide a description and information about each new version of the data.
•  Provide details about the software/code that is used for preprocessing the data.
•  Adapt metadata standards for describing the data.
•  Backup your files every while.
Tools
•  DMPTool
•  DASH
•  Figshare
•  EZID
•  Box and Drive
•  Merritt repository
Source: http://data-archive.ac.uk/create-manage/life-cycle
References
1.  AlNoamany, Yasmin. "How to make your research reproducible”, http://guides.lib.berkeley.edu/reproducibility-guide,
(2017).
2.  Stodden, Victoria. "Enabling reproducible research: Open licensing for scientific innovation." (2009).
3.  Bailey, David H., Jonathan M. Borwein, and Victoria Stodden. "Facilitating reproducibility in scientific computing:
Principles and practice." Reproducibility: Principles, Problems, Practices, and Prospects (2014): 205-232.
4.  Stodden, Victoria, et al. "Enhancing reproducibility for computational methods." Science 354.6317 (2016): 1240-1241.
Workflow
Workflow documentation – detailed steps of the workflow
that capture the process of the scientific research.
•  Weekly/daily notes on the project's stages
•  Documentation for the steps of the workflow
For managing the research workflow, document:
•  The steps of the research starting from the design till
fetching the data till producing graphs and tables in the
scientific output.
•  All adopted libraries and integrated algorithms.
•  All citations and information of code and data used.
•  The input and the output of each step.
Electronic Notebooks, such as Jupyter help documenting the workflow!
Tools
•  Jupyter
•  knitr
•  Overleaf
•  ShareLatex
•  GitHub
•  Zenodo
Sponsored in part through grants from the Alfred P. Sloan Foundation #G-2014-13746 and from the National Science
Foundation NSF ACI #1349002

A Guide for Reproducible Research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Guide for Reproducible Research

Similar to A Guide for Reproducible Research (20)

More from Yasmin AlNoamany, PhD

More from Yasmin AlNoamany, PhD (14)

Recently uploaded

Recently uploaded (20)

A Guide for Reproducible Research