Reproducible research: practice

Reproducible research:
Practice
Tobin Magle, PhD
Bioinformationist
Health Science Library
University of Colorado Anschutz Medical Campus

Reproducibility
is the practice of distributing all data,
software source code, and tools required
to reproduce the results discussed in a
research publication.
https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards

Replication vs. Reproducibility
• Replication: The confirmation of results and conclusions from one study
obtained independently in another is considered the scientific gold standard.
• “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225
• Some studies can’t be replicated: too big, too costly, too time consuming, one
time event, rare samples
• Reproducibility: minimum standard for assessing the value of scientific claims,
particularly when full independent replication of a study is not feasible
• “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847

Research Lifecycle:
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Analyze
Data
Write
manuscript
1. Technological advances:
• Huge, complex digital datasets
• Computational power
• Ability to share
2. Human Error:
• Poor Reporting
• Flawed analyses
Complications

Complicated Research Lifecycle
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage

Requires new expertise and infrastructure
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage
Data
Management
Plans
Version
control
Literate
Statistical
Computing
Reproducible
research
tools

DMPTool
• Developed by California Digital Libraries to help researchers write
data management plans
• https://dmptool.org/user_sessions/institution
• Select University of Colorado Anschutz Medical Campus

Create an account* or signin
*We’re working with OIT to allow us to log in with CU passport credentials. Stay tuned

Data management exercise
• Create a DMPTool account
• Pick a template and create a DMP
• Take 5 minutes to click through the template and think about how
these questions relate to your research

Version control
Version control is a system that records changes to a file or set of files
over time so that you can recall specific versions later.
https://git-scm.com/doc

Intuitive version control
But what if you save
a new file into the
wrong version?
Original
(V1)
V3
V2

Local version control system
Figure 1-1. Local version control.
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
But what if you
need to collaborate?
• Keeps files in one place
• No copies
• Keeps track of changes
• Like Apple’s Time machine

Centralized version control
Figure 1-2. Centralized version control.
But what the
server goes down?
What if you can’t
get online?
• Keeps files on a server
• No copies
• Can work simultaneously
on different files

Distributed version control
Figure 1-3. Distributed version control.
Git, Mercurial, Bazaar or Darcs
• Keeps files locally AND on a server
• Changes are among computers and
server
• Can work simultaneously

What is Git?
• Distributed version control system developed by the Linux community
• A stream of snapshots
Figure 1-5. Storing data as snapshots of the project over time.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics

3 states of repository files
• Modified – the file is altered but not committed
• Staged – the file is altered and marked to go to the next commit
• Committed- the file is altered and stored in your local DB

3 Sections of your directory
Figure 1-6. Working directory, staging area, and Git directory.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
Committed
Modified
Staged

Important git commands
• Init (Initialize) – start a git repository
• Add – add files to the git repository (for initial add and staging), can
be skipped with –a command
• Commit – safely store the files in your git repository
• Clone – make a copy of someone else’s git repository

File statuses and how they change
Figure 2-1. The lifecycle of the status of your files.
https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository

GitHub Desktop
Repositories
Visual
RepositoriesAltered files
Commit notes

Git your hands on git
• Create a GitHub account
• Go to the repository:
https://github.com/maglet
/hands-on-git
• Clone the repository

Log in to GitHub desktop
• Hands-on-git should be in the left hand panel under GitHub

When you change a file…
Automatically adds files/alterations
To commit

After commit
Added a
“bubble”
Click
there to
revert

Cloning/Branching/Forking
• Cloning: make a local copy of a repository online or elsewhere
• Branching: creating a separate stream to test new features, so you
don’t affect the “trunk”; branches depend on the trunk
• Collaboration
• Forking: Making a separate copy of a repository that is not dependent
• Using others’ work is a starting point; preserving things that the owner might
delete for yourself

Pull request
Meets back up
with “master”

Approve Pull request
Meets back up
with “master”,
can be reverted
Can delete
unused
branches

Pull request approved
Back on
one track

Exercise
• Go to the repository you cloned earlier
• Create a text file with your name on it
• Add it to the name folder
• Submit a pull request
• Look at what happens to the visual representation

Literate (statistical) programming
• Resulting report is a stream of text (human readable) and code
(machine readable)
• Alternate text and code
• Sweave
• R markdown

R Markdown
• Open
• Write
• Embed
• Render
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Install knitr and markdown packages
• Tools > install packages
• Enter the package name (will autocomplete)
• Knitr
• Markdown
• OR install.packages("knitr”)
• If it fails, try again

Open/Create a markdown document

Write: useful syntax
• Plain text
• *italics* -> italics
• **bold** -> bold
• #Header -> Header (more # decreases size)
• Can also draw:
• Insert pictures
• Ordered and unordered list
• Tables

Embed code
• Inline – Use variables in the human readable text
• `r 2 + 2`
• Code chunks - Include working code that generates output
• ```{r}
• #Code goes here
• ```
• Display Options –

Render
• Won’t render unless the code runs with no errors
• You know it should be reproducible
• Render using the knit function
• Output Formats
• Knit HTML
• Knit PDF – requires latex
• Knit Word

Exercise
• Edit the markdown document using the cheat sheet to see what you
can do
• Try to knit it after creating a typo in the code
• Insert other pictures from the web
• Try to make a table
• Make some bulleted lists
• Insert a block quote
• Make the graph prettier
• Play around!

Reproducible research: practice

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reproducible research: practice

Similar to Reproducible research: practice (20)

More from C. Tobin Magle

More from C. Tobin Magle (13)

Recently uploaded

Recently uploaded (20)

Reproducible research: practice

Editor's Notes