SlideShare a Scribd company logo
Reproducible research:
Practice
Tobin Magle, PhD
Bioinformationist
Health Science Library
University of Colorado Anschutz Medical Campus
Reproducibility
is the practice of distributing all data,
software source code, and tools required
to reproduce the results discussed in a
research publication.
https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
Replication vs. Reproducibility
• Replication: The confirmation of results and conclusions from one study
obtained independently in another is considered the scientific gold standard.
• “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225
• Some studies can’t be replicated: too big, too costly, too time consuming, one
time event, rare samples
• Reproducibility: minimum standard for assessing the value of scientific claims,
particularly when full independent replication of a study is not feasible
• “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
Research Lifecycle:
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Analyze
Data
Write
manuscript
1. Technological advances:
• Huge, complex digital datasets
• Computational power
• Ability to share
2. Human Error:
• Poor Reporting
• Flawed analyses
Complications
Complicated Research Lifecycle
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage
Requires new expertise and infrastructure
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage
Data
Management
Plans
Version
control
Literate
Statistical
Computing
Reproducible
research
tools
DMPTool
• Developed by California Digital Libraries to help researchers write
data management plans
• https://dmptool.org/user_sessions/institution
• Select University of Colorado Anschutz Medical Campus
Create an account* or signin
*We’re working with OIT to allow us to log in with CU passport credentials. Stay tuned
CU Anschutz-specific content
Data management exercise
• Create a DMPTool account
• Pick a template and create a DMP
• Take 5 minutes to click through the template and think about how
these questions relate to your research
Version control
Version control is a system that records changes to a file or set of files
over time so that you can recall specific versions later.
https://git-scm.com/doc
Intuitive version control
But what if you save
a new file into the
wrong version?
Original
(V1)
V3
V2
Local version control system
Figure 1-1. Local version control.
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
But what if you
need to collaborate?
• Keeps files in one place
• No copies
• Keeps track of changes
• Like Apple’s Time machine
Centralized version control
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
Figure 1-2. Centralized version control.
But what the
server goes down?
What if you can’t
get online?
• Keeps files on a server
• No copies
• Keeps track of changes
• Can work simultaneously
on different files
Distributed version control
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
Figure 1-3. Distributed version control.
Git, Mercurial, Bazaar or Darcs
• Keeps files locally AND on a server
• Changes are among computers and
server
• Keeps track of changes
• Can work simultaneously
What is Git?
• Distributed version control system developed by the Linux community
• A stream of snapshots
Figure 1-5. Storing data as snapshots of the project over time.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
3 states of repository files
• Modified – the file is altered but not committed
• Staged – the file is altered and marked to go to the next commit
• Committed- the file is altered and stored in your local DB
3 Sections of your directory
Figure 1-6. Working directory, staging area, and Git directory.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
Committed
Modified
Staged
Important git commands
• Init (Initialize) – start a git repository
• Add – add files to the git repository (for initial add and staging), can
be skipped with –a command
• Commit – safely store the files in your git repository
• Clone – make a copy of someone else’s git repository
File statuses and how they change
Figure 2-1. The lifecycle of the status of your files.
https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
GitHub Desktop
Repositories
Visual
RepositoriesAltered files
Commit notes
Git your hands on git
• Create a GitHub account
• Go to the repository:
https://github.com/maglet
/hands-on-git
• Clone the repository
Log in to GitHub desktop
• Hands-on-git should be in the left hand panel under GitHub
When you change a file…
Automatically adds files/alterations
To commit
After commit
Added a
“bubble”
Click
there to
revert
Reverting
Cloning/Branching/Forking
• Cloning: make a local copy of a repository online or elsewhere
• Branching: creating a separate stream to test new features, so you
don’t affect the “trunk”; branches depend on the trunk
• Collaboration
• Forking: Making a separate copy of a repository that is not dependent
• Using others’ work is a starting point; preserving things that the owner might
delete for yourself
Branching
Splits off
Editing a branch
Pull request
Meets back up
with “master”
Approve Pull request
Meets back up
with “master”,
can be reverted
Can delete
unused
branches
Pull request approved
Back on
one track
Exercise
• Go to the repository you cloned earlier
• Create a text file with your name on it
• Add it to the name folder
• Submit a pull request
• Look at what happens to the visual representation
Literate (statistical) programming
• Resulting report is a stream of text (human readable) and code
(machine readable)
• Alternate text and code
• Sweave
• R markdown
R Markdown
• Open
• Write
• Embed
• Render
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
Install knitr and markdown packages
• Tools > install packages
• Enter the package name (will autocomplete)
• Knitr
• Markdown
• OR install.packages("knitr”)
• If it fails, try again
Open/Create a markdown document
Write: useful syntax
• Plain text
• *italics* -> italics
• **bold** -> bold
• #Header -> Header (more # decreases size)
• Can also draw:
• Insert pictures
• Ordered and unordered list
• Tables
Embed code
• Inline – Use variables in the human readable text
• `r 2 + 2`
• Code chunks - Include working code that generates output
• ```{r}
• #Code goes here
• ```
• Display Options –
Render
• Won’t render unless the code runs with no errors
• You know it should be reproducible
• Render using the knit function
• Output Formats
• Knit HTML
• Knit PDF – requires latex
• Knit Word
Exercise
• Edit the markdown document using the cheat sheet to see what you
can do
• Try to knit it after creating a typo in the code
• Insert other pictures from the web
• Try to make a table
• Make some bulleted lists
• Insert a block quote
• Make the graph prettier
• Play around!

More Related Content

What's hot

Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approach
Krzysztof Gorgolewski
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
FAIRDOM
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Containers in Science: neuroimaging use cases
Containers in Science: neuroimaging use casesContainers in Science: neuroimaging use cases
Containers in Science: neuroimaging use cases
Krzysztof Gorgolewski
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
Cameron Neylon
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
Yasmin AlNoamany, PhD
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Muhammad Javed
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
Araport
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
Carole Goble
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
tools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital worldtools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital world
Brian Bot
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 

What's hot (20)

Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approach
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
Containers in Science: neuroimaging use cases
Containers in Science: neuroimaging use casesContainers in Science: neuroimaging use cases
Containers in Science: neuroimaging use cases
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
tools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital worldtools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital world
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 

Similar to Reproducible research: practice

Reproducible research
Reproducible researchReproducible research
Reproducible research
C. Tobin Magle
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?
Bruno Capuano
 
CSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GITCSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GIT
PouriaQashqai1
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
atishgoswami
 
Source andassetcontrolingamedev
Source andassetcontrolingamedevSource andassetcontrolingamedev
Source andassetcontrolingamedevMatt Benic
 
Git
GitGit
The Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHubThe Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHub
BigBlueHat
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIs
Tim Osborn
 
Mini git tutorial
Mini git tutorialMini git tutorial
Mini git tutorial
Cristian Lucchesi
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
Susan Johnston
 
Intro to Git: a hands-on workshop
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshop
Cisco DevNet
 
Git 101
Git 101Git 101
Git 101
jayrparro
 
Git 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGit 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using Git
Geoff Hoffman
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
Adam Getchell
 
[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git
Ivano Malavolta
 
Introduction to Git for Network Engineers
Introduction to Git for Network EngineersIntroduction to Git for Network Engineers
Introduction to Git for Network Engineers
Joel W. King
 
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Ahmed El-Arabawy
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
Stephen Turner
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Josh Levy-Kramer
 
Source Code Management Slides
Source Code Management SlidesSource Code Management Slides
Source Code Management Slides
daschuck
 

Similar to Reproducible research: practice (20)

Reproducible research
Reproducible researchReproducible research
Reproducible research
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?
 
CSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GITCSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GIT
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Source andassetcontrolingamedev
Source andassetcontrolingamedevSource andassetcontrolingamedev
Source andassetcontrolingamedev
 
Git
GitGit
Git
 
The Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHubThe Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHub
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIs
 
Mini git tutorial
Mini git tutorialMini git tutorial
Mini git tutorial
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Intro to Git: a hands-on workshop
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshop
 
Git 101
Git 101Git 101
Git 101
 
Git 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGit 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using Git
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
 
[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git
 
Introduction to Git for Network Engineers
Introduction to Git for Network EngineersIntroduction to Git for Network Engineers
Introduction to Git for Network Engineers
 
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
 
Source Code Management Slides
Source Code Management SlidesSource Code Management Slides
Source Code Management Slides
 

More from C. Tobin Magle

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
C. Tobin Magle
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
C. Tobin Magle
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
C. Tobin Magle
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
C. Tobin Magle
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
C. Tobin Magle
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
C. Tobin Magle
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
C. Tobin Magle
 
Open access day
Open access dayOpen access day
Open access day
C. Tobin Magle
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
C. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
C. Tobin Magle
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
C. Tobin Magle
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
C. Tobin Magle
 

More from C. Tobin Magle (13)

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Open access day
Open access dayOpen access day
Open access day
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Recently uploaded

tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Reproducible research: practice

  • 1. Reproducible research: Practice Tobin Magle, PhD Bioinformationist Health Science Library University of Colorado Anschutz Medical Campus
  • 2. Reproducibility is the practice of distributing all data, software source code, and tools required to reproduce the results discussed in a research publication. https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
  • 3. Replication vs. Reproducibility • Replication: The confirmation of results and conclusions from one study obtained independently in another is considered the scientific gold standard. • “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225 • Some studies can’t be replicated: too big, too costly, too time consuming, one time event, rare samples • Reproducibility: minimum standard for assessing the value of scientific claims, particularly when full independent replication of a study is not feasible • “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
  • 4. Research Lifecycle: Form Hypothesis Collect Data Design Experiment Publish research Analyze Data Write manuscript 1. Technological advances: • Huge, complex digital datasets • Computational power • Ability to share 2. Human Error: • Poor Reporting • Flawed analyses Complications
  • 6. Requires new expertise and infrastructure Form Hypothesis Collect Data Design Experiment Publish research Clean Data Analyze Data Write manuscript Share data Curate data Plan for data storage Data Management Plans Version control Literate Statistical Computing Reproducible research tools
  • 7. DMPTool • Developed by California Digital Libraries to help researchers write data management plans • https://dmptool.org/user_sessions/institution • Select University of Colorado Anschutz Medical Campus
  • 8. Create an account* or signin *We’re working with OIT to allow us to log in with CU passport credentials. Stay tuned
  • 10. Data management exercise • Create a DMPTool account • Pick a template and create a DMP • Take 5 minutes to click through the template and think about how these questions relate to your research
  • 11. Version control Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. https://git-scm.com/doc
  • 12. Intuitive version control But what if you save a new file into the wrong version? Original (V1) V3 V2
  • 13. Local version control system Figure 1-1. Local version control. https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control But what if you need to collaborate? • Keeps files in one place • No copies • Keeps track of changes • Like Apple’s Time machine
  • 14. Centralized version control https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control Figure 1-2. Centralized version control. But what the server goes down? What if you can’t get online? • Keeps files on a server • No copies • Keeps track of changes • Can work simultaneously on different files
  • 15. Distributed version control https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control Figure 1-3. Distributed version control. Git, Mercurial, Bazaar or Darcs • Keeps files locally AND on a server • Changes are among computers and server • Keeps track of changes • Can work simultaneously
  • 16. What is Git? • Distributed version control system developed by the Linux community • A stream of snapshots Figure 1-5. Storing data as snapshots of the project over time. https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
  • 17. 3 states of repository files • Modified – the file is altered but not committed • Staged – the file is altered and marked to go to the next commit • Committed- the file is altered and stored in your local DB
  • 18. 3 Sections of your directory Figure 1-6. Working directory, staging area, and Git directory. https://git-scm.com/book/en/v2/Getting-Started-Git-Basics Committed Modified Staged
  • 19. Important git commands • Init (Initialize) – start a git repository • Add – add files to the git repository (for initial add and staging), can be skipped with –a command • Commit – safely store the files in your git repository • Clone – make a copy of someone else’s git repository
  • 20. File statuses and how they change Figure 2-1. The lifecycle of the status of your files. https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
  • 22. Git your hands on git • Create a GitHub account • Go to the repository: https://github.com/maglet /hands-on-git • Clone the repository
  • 23. Log in to GitHub desktop • Hands-on-git should be in the left hand panel under GitHub
  • 24. When you change a file… Automatically adds files/alterations To commit
  • 27. Cloning/Branching/Forking • Cloning: make a local copy of a repository online or elsewhere • Branching: creating a separate stream to test new features, so you don’t affect the “trunk”; branches depend on the trunk • Collaboration • Forking: Making a separate copy of a repository that is not dependent • Using others’ work is a starting point; preserving things that the owner might delete for yourself
  • 30. Pull request Meets back up with “master”
  • 31. Approve Pull request Meets back up with “master”, can be reverted Can delete unused branches
  • 33. Exercise • Go to the repository you cloned earlier • Create a text file with your name on it • Add it to the name folder • Submit a pull request • Look at what happens to the visual representation
  • 34. Literate (statistical) programming • Resulting report is a stream of text (human readable) and code (machine readable) • Alternate text and code • Sweave • R markdown
  • 35. R Markdown • Open • Write • Embed • Render https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
  • 36. Install knitr and markdown packages • Tools > install packages • Enter the package name (will autocomplete) • Knitr • Markdown • OR install.packages("knitr”) • If it fails, try again
  • 38. Write: useful syntax • Plain text • *italics* -> italics • **bold** -> bold • #Header -> Header (more # decreases size) • Can also draw: • Insert pictures • Ordered and unordered list • Tables
  • 39. Embed code • Inline – Use variables in the human readable text • `r 2 + 2` • Code chunks - Include working code that generates output • ```{r} • #Code goes here • ``` • Display Options –
  • 40. Render • Won’t render unless the code runs with no errors • You know it should be reproducible • Render using the knit function • Output Formats • Knit HTML • Knit PDF – requires latex • Knit Word
  • 41. Exercise • Edit the markdown document using the cheat sheet to see what you can do • Try to knit it after creating a typo in the code • Insert other pictures from the web • Try to make a table • Make some bulleted lists • Insert a block quote • Make the graph prettier • Play around!

Editor's Notes

  1. What issues do you see with the feasibility of this process?
  2. These services span the research data lifecycle Plan what you’re going to do with your data before you generate it Curate and manage during collection Temporary storage Prepare for long term storage Sharing optional (for now)
  3. These services span the research data lifecycle Plan what you’re going to do with your data before you generate it Curate and manage during collection Temporary storage Prepare for long term storage Sharing optional (for now) Expertise and infrastructure
  4. These services span the research data lifecycle Plan what you’re going to do with your data before you generate it Curate and manage during collection Temporary storage Prepare for long term storage Sharing optional (for now) Expertise and infrastructure