SlideShare a Scribd company logo
1 of 41
Reproducible research:
Practice
Tobin Magle, PhD
Bioinformationist
Health Science Library
University of Colorado Anschutz Medical Campus
Reproducibility
is the practice of distributing all data,
software source code, and tools required
to reproduce the results discussed in a
research publication.
https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
Replication vs. Reproducibility
• Replication: The confirmation of results and conclusions from one study
obtained independently in another is considered the scientific gold standard.
• “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225
• Some studies can’t be replicated: too big, too costly, too time consuming, one
time event, rare samples
• Reproducibility: minimum standard for assessing the value of scientific claims,
particularly when full independent replication of a study is not feasible
• “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
Research Lifecycle:
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Analyze
Data
Write
manuscript
1. Technological advances:
• Huge, complex digital datasets
• Computational power
• Ability to share
2. Human Error:
• Poor Reporting
• Flawed analyses
Complications
Complicated Research Lifecycle
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage
Requires new expertise and infrastructure
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage
Data
Management
Plans
Version
control
Literate
Statistical
Computing
Reproducible
research
tools
DMPTool
• Developed by California Digital Libraries to help researchers write
data management plans
• https://dmptool.org/user_sessions/institution
• Select University of Colorado Anschutz Medical Campus
Create an account* or signin
*We’re working with OIT to allow us to log in with CU passport credentials. Stay tuned
CU Anschutz-specific content
Data management exercise
• Create a DMPTool account
• Pick a template and create a DMP
• Take 5 minutes to click through the template and think about how
these questions relate to your research
Version control
Version control is a system that records changes to a file or set of files
over time so that you can recall specific versions later.
https://git-scm.com/doc
Intuitive version control
But what if you save
a new file into the
wrong version?
Original
(V1)
V3
V2
Local version control system
Figure 1-1. Local version control.
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
But what if you
need to collaborate?
• Keeps files in one place
• No copies
• Keeps track of changes
• Like Apple’s Time machine
Centralized version control
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
Figure 1-2. Centralized version control.
But what the
server goes down?
What if you can’t
get online?
• Keeps files on a server
• No copies
• Keeps track of changes
• Can work simultaneously
on different files
Distributed version control
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
Figure 1-3. Distributed version control.
Git, Mercurial, Bazaar or Darcs
• Keeps files locally AND on a server
• Changes are among computers and
server
• Keeps track of changes
• Can work simultaneously
What is Git?
• Distributed version control system developed by the Linux community
• A stream of snapshots
Figure 1-5. Storing data as snapshots of the project over time.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
3 states of repository files
• Modified – the file is altered but not committed
• Staged – the file is altered and marked to go to the next commit
• Committed- the file is altered and stored in your local DB
3 Sections of your directory
Figure 1-6. Working directory, staging area, and Git directory.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
Committed
Modified
Staged
Important git commands
• Init (Initialize) – start a git repository
• Add – add files to the git repository (for initial add and staging), can
be skipped with –a command
• Commit – safely store the files in your git repository
• Clone – make a copy of someone else’s git repository
File statuses and how they change
Figure 2-1. The lifecycle of the status of your files.
https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
GitHub Desktop
Repositories
Visual
RepositoriesAltered files
Commit notes
Git your hands on git
• Create a GitHub account
• Go to the repository:
https://github.com/maglet
/hands-on-git
• Clone the repository
Log in to GitHub desktop
• Hands-on-git should be in the left hand panel under GitHub
When you change a file…
Automatically adds files/alterations
To commit
After commit
Added a
“bubble”
Click
there to
revert
Reverting
Cloning/Branching/Forking
• Cloning: make a local copy of a repository online or elsewhere
• Branching: creating a separate stream to test new features, so you
don’t affect the “trunk”; branches depend on the trunk
• Collaboration
• Forking: Making a separate copy of a repository that is not dependent
• Using others’ work is a starting point; preserving things that the owner might
delete for yourself
Branching
Splits off
Editing a branch
Pull request
Meets back up
with “master”
Approve Pull request
Meets back up
with “master”,
can be reverted
Can delete
unused
branches
Pull request approved
Back on
one track
Exercise
• Go to the repository you cloned earlier
• Create a text file with your name on it
• Add it to the name folder
• Submit a pull request
• Look at what happens to the visual representation
Literate (statistical) programming
• Resulting report is a stream of text (human readable) and code
(machine readable)
• Alternate text and code
• Sweave
• R markdown
R Markdown
• Open
• Write
• Embed
• Render
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
Install knitr and markdown packages
• Tools > install packages
• Enter the package name (will autocomplete)
• Knitr
• Markdown
• OR install.packages("knitr”)
• If it fails, try again
Open/Create a markdown document
Write: useful syntax
• Plain text
• *italics* -> italics
• **bold** -> bold
• #Header -> Header (more # decreases size)
• Can also draw:
• Insert pictures
• Ordered and unordered list
• Tables
Embed code
• Inline – Use variables in the human readable text
• `r 2 + 2`
• Code chunks - Include working code that generates output
• ```{r}
• #Code goes here
• ```
• Display Options –
Render
• Won’t render unless the code runs with no errors
• You know it should be reproducible
• Render using the knit function
• Output Formats
• Knit HTML
• Knit PDF – requires latex
• Knit Word
Exercise
• Edit the markdown document using the cheat sheet to see what you
can do
• Try to knit it after creating a typo in the code
• Insert other pictures from the web
• Try to make a table
• Make some bulleted lists
• Insert a block quote
• Make the graph prettier
• Play around!

More Related Content

What's hot

Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachKrzysztof Gorgolewski
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Containers in Science: neuroimaging use cases
Containers in Science: neuroimaging use casesContainers in Science: neuroimaging use cases
Containers in Science: neuroimaging use casesKrzysztof Gorgolewski
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Muhammad Javed
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of AraportAraport
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
tools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital worldtools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital worldBrian Bot
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 

What's hot (20)

Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approach
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
Containers in Science: neuroimaging use cases
Containers in Science: neuroimaging use casesContainers in Science: neuroimaging use cases
Containers in Science: neuroimaging use cases
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
tools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital worldtools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital world
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 

Similar to Reproducible research: practice

Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Bruno Capuano
 
CSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GITCSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GITPouriaQashqai1
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Gitatishgoswami
 
Source andassetcontrolingamedev
Source andassetcontrolingamedevSource andassetcontrolingamedev
Source andassetcontrolingamedevMatt Benic
 
The Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHubThe Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHubBigBlueHat
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIsTim Osborn
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R StudioSusan Johnston
 
Intro to Git: a hands-on workshop
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshopCisco DevNet
 
Git 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGit 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGeoff Hoffman
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAdam Getchell
 
[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with GitIvano Malavolta
 
Introduction to Git for Network Engineers
Introduction to Git for Network EngineersIntroduction to Git for Network Engineers
Introduction to Git for Network EngineersJoel W. King
 
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)Ahmed El-Arabawy
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Josh Levy-Kramer
 
Source Code Management Slides
Source Code Management SlidesSource Code Management Slides
Source Code Management Slidesdaschuck
 

Similar to Reproducible research: practice (20)

Reproducible research
Reproducible researchReproducible research
Reproducible research
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?
 
CSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GITCSE 390 Lecture 9 - Version Control with GIT
CSE 390 Lecture 9 - Version Control with GIT
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Source andassetcontrolingamedev
Source andassetcontrolingamedevSource andassetcontrolingamedev
Source andassetcontrolingamedev
 
Git
GitGit
Git
 
The Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHubThe Basics of Open Source Collaboration With Git and GitHub
The Basics of Open Source Collaboration With Git and GitHub
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIs
 
Mini git tutorial
Mini git tutorialMini git tutorial
Mini git tutorial
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Intro to Git: a hands-on workshop
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshop
 
Git 101
Git 101Git 101
Git 101
 
Git 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGit 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using Git
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
 
[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git[2015/2016] Collaborative software development with Git
[2015/2016] Collaborative software development with Git
 
Introduction to Git for Network Engineers
Introduction to Git for Network EngineersIntroduction to Git for Network Engineers
Introduction to Git for Network Engineers
 
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
 
Source Code Management Slides
Source Code Management SlidesSource Code Management Slides
Source Code Management Slides
 

More from C. Tobin Magle

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librariansC. Tobin Magle
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basicsC. Tobin Magle
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyrC. Tobin Magle
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using RC. Tobin Magle
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.C. Tobin Magle
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan LibraryC. Tobin Magle
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementC. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the libraryC. Tobin Magle
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesC. Tobin Magle
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 

More from C. Tobin Magle (13)

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Open access day
Open access dayOpen access day
Open access day
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Recently uploaded

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1sinhaabhiyanshu
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptTanveerAhmed817946
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 

Recently uploaded (20)

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 

Reproducible research: practice

  • 1. Reproducible research: Practice Tobin Magle, PhD Bioinformationist Health Science Library University of Colorado Anschutz Medical Campus
  • 2. Reproducibility is the practice of distributing all data, software source code, and tools required to reproduce the results discussed in a research publication. https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
  • 3. Replication vs. Reproducibility • Replication: The confirmation of results and conclusions from one study obtained independently in another is considered the scientific gold standard. • “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225 • Some studies can’t be replicated: too big, too costly, too time consuming, one time event, rare samples • Reproducibility: minimum standard for assessing the value of scientific claims, particularly when full independent replication of a study is not feasible • “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
  • 4. Research Lifecycle: Form Hypothesis Collect Data Design Experiment Publish research Analyze Data Write manuscript 1. Technological advances: • Huge, complex digital datasets • Computational power • Ability to share 2. Human Error: • Poor Reporting • Flawed analyses Complications
  • 6. Requires new expertise and infrastructure Form Hypothesis Collect Data Design Experiment Publish research Clean Data Analyze Data Write manuscript Share data Curate data Plan for data storage Data Management Plans Version control Literate Statistical Computing Reproducible research tools
  • 7. DMPTool • Developed by California Digital Libraries to help researchers write data management plans • https://dmptool.org/user_sessions/institution • Select University of Colorado Anschutz Medical Campus
  • 8. Create an account* or signin *We’re working with OIT to allow us to log in with CU passport credentials. Stay tuned
  • 10. Data management exercise • Create a DMPTool account • Pick a template and create a DMP • Take 5 minutes to click through the template and think about how these questions relate to your research
  • 11. Version control Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. https://git-scm.com/doc
  • 12. Intuitive version control But what if you save a new file into the wrong version? Original (V1) V3 V2
  • 13. Local version control system Figure 1-1. Local version control. https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control But what if you need to collaborate? • Keeps files in one place • No copies • Keeps track of changes • Like Apple’s Time machine
  • 14. Centralized version control https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control Figure 1-2. Centralized version control. But what the server goes down? What if you can’t get online? • Keeps files on a server • No copies • Keeps track of changes • Can work simultaneously on different files
  • 15. Distributed version control https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control Figure 1-3. Distributed version control. Git, Mercurial, Bazaar or Darcs • Keeps files locally AND on a server • Changes are among computers and server • Keeps track of changes • Can work simultaneously
  • 16. What is Git? • Distributed version control system developed by the Linux community • A stream of snapshots Figure 1-5. Storing data as snapshots of the project over time. https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
  • 17. 3 states of repository files • Modified – the file is altered but not committed • Staged – the file is altered and marked to go to the next commit • Committed- the file is altered and stored in your local DB
  • 18. 3 Sections of your directory Figure 1-6. Working directory, staging area, and Git directory. https://git-scm.com/book/en/v2/Getting-Started-Git-Basics Committed Modified Staged
  • 19. Important git commands • Init (Initialize) – start a git repository • Add – add files to the git repository (for initial add and staging), can be skipped with –a command • Commit – safely store the files in your git repository • Clone – make a copy of someone else’s git repository
  • 20. File statuses and how they change Figure 2-1. The lifecycle of the status of your files. https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
  • 22. Git your hands on git • Create a GitHub account • Go to the repository: https://github.com/maglet /hands-on-git • Clone the repository
  • 23. Log in to GitHub desktop • Hands-on-git should be in the left hand panel under GitHub
  • 24. When you change a file… Automatically adds files/alterations To commit
  • 27. Cloning/Branching/Forking • Cloning: make a local copy of a repository online or elsewhere • Branching: creating a separate stream to test new features, so you don’t affect the “trunk”; branches depend on the trunk • Collaboration • Forking: Making a separate copy of a repository that is not dependent • Using others’ work is a starting point; preserving things that the owner might delete for yourself
  • 30. Pull request Meets back up with “master”
  • 31. Approve Pull request Meets back up with “master”, can be reverted Can delete unused branches
  • 33. Exercise • Go to the repository you cloned earlier • Create a text file with your name on it • Add it to the name folder • Submit a pull request • Look at what happens to the visual representation
  • 34. Literate (statistical) programming • Resulting report is a stream of text (human readable) and code (machine readable) • Alternate text and code • Sweave • R markdown
  • 35. R Markdown • Open • Write • Embed • Render https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
  • 36. Install knitr and markdown packages • Tools > install packages • Enter the package name (will autocomplete) • Knitr • Markdown • OR install.packages("knitr”) • If it fails, try again
  • 38. Write: useful syntax • Plain text • *italics* -> italics • **bold** -> bold • #Header -> Header (more # decreases size) • Can also draw: • Insert pictures • Ordered and unordered list • Tables
  • 39. Embed code • Inline – Use variables in the human readable text • `r 2 + 2` • Code chunks - Include working code that generates output • ```{r} • #Code goes here • ``` • Display Options –
  • 40. Render • Won’t render unless the code runs with no errors • You know it should be reproducible • Render using the knit function • Output Formats • Knit HTML • Knit PDF – requires latex • Knit Word
  • 41. Exercise • Edit the markdown document using the cheat sheet to see what you can do • Try to knit it after creating a typo in the code • Insert other pictures from the web • Try to make a table • Make some bulleted lists • Insert a block quote • Make the graph prettier • Play around!

Editor's Notes

  1. What issues do you see with the feasibility of this process?
  2. These services span the research data lifecycle Plan what you’re going to do with your data before you generate it Curate and manage during collection Temporary storage Prepare for long term storage Sharing optional (for now)
  3. These services span the research data lifecycle Plan what you’re going to do with your data before you generate it Curate and manage during collection Temporary storage Prepare for long term storage Sharing optional (for now) Expertise and infrastructure
  4. These services span the research data lifecycle Plan what you’re going to do with your data before you generate it Curate and manage during collection Temporary storage Prepare for long term storage Sharing optional (for now) Expertise and infrastructure