SlideShare a Scribd company logo
Reproducible
research
C. Tobin Magle, PhD
06-26-2017
10:00 – 11:30 a.m.
Morgan Library
Computer Classroom 175
Outline
• What is reproducible research?
• Why would I do that?
• How? (in R Studio)
• Automation
• Git
• R Markdown
Hypothesis Data
Experimental
design
ResultsArticle
The research cycle
1. Technological advances:
• Huge, complex digital datasets
• Ability to share
2. Human Error:
• Poor Reporting
• Flawed analyses
Complications
Hypothesis
Raw
data
Experimental
design
Tidy
Data
ResultsArticle
Processing/
Cleaning
Analysis
Open Data
Code
The research cycle
Reproducible research
is the practice of distributing all data,
software source code, and tools required
to reproduce the results discussed in a
research publication.
https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
Reproducible research
=
Data (with metadata)
+
Code/Software
Reproducible research
=
Transparency
Hypothesis
Raw
data
Experimental
design
Tidy
Data
ResultsArticle
Cleaning
Analysis
Open Data
Code
Reproducible
Research
The research cycle
Replication vs. Reproducibility
• Replication: Same conclusion new study (gold standard)
• “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225
• Replication isn’t always feasible: too big, too costly, too time
consuming, one time event, rare samples
• Reproducibility: Same results from same data and code (minimum
standard for validity)
• “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
Reproducibility spectrum
“Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
Using markdown and
version control in R Studio:
Documenting analysis
• Optimal: the instructions should be an automated script
file (ie, “code”)
• Minimum: Written instructions that allow for the complete
reproduction of your analysis
Raw
Data
Processed
Data
Results
Cleaning/
processing
Analysis
Research is repetitive
• Replication
• Same assay, different samples
• Longitudinal experiments
http://xkcd.com/242/
Doing things by hand is…
• Slow
• Hard to document
• Hard to repeat
Exercise 1: Making graphs
• Download height graph XLS: http://bit.ly/2ooMyaO
• Describe how to make a bar graph in excel
Automation
• Fast
• Easy to document
• Easy to repeat
• Write scripts or save log files
Example: Making graphs
• Use a script: http://bit.ly/2oXOZUH
Details to record for processing/analysis
• What software was used? (R Studio, script)
• Does it support log files/scripts? (yes!)
• What version # and settings were used? (R version 3.3.2)
• What else does the software need to run?
• Computer architecture
• OS/Software/tool/add ons (libraries/packages)
• External databases
In R: the sessionInfo() command
Intuitive
version
control
http://phdcomics.com/comics.php?f=1531
Version Control
• A system that records changes to a
file or set of files over time so that you
can recall specific versions later
• https://git-scm.com/doc
• Saves EVERYTHING
• Records who made what change
• Identifies conflicting changes
• Not just for code
Make an empty GitHub
• Github.com
Tell R Studio where git is
*if you don’t’ know where git is installed, try
• Mac terminal: which git
• PC: where git.exe
Make a new version control project
*auto-recognizes version control if you select an existing directory
Select git, select options
Clone
Remote
Local
Add files
• Check boxes to add files to
the “staging area”
• Gets them ready to be added
(committed) to the repository
Staging area vs. repository
Commit files
• Click “commit”
• Write a commit message
• Click “commit”
• Adds the changes to the
repository
Staging area vs. repository
Exercise 2
• Make some changes to the script file in your repo
• add the changes by clicking the checkbox
• Create a new file – don’t click the checkbox next to it
• Commit the changes
• Push the changes to the remote
• Does the new file appear in your remote git repository
Push to remote
Push
Literate programming
=
human readable (text)
+
machine readable (code)
R markdown
R Markdown
• Open
• Write
• Embed
• Render
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
Install knitr and markdown packages
• Tools > install packages
• Enter the package name
(will autocomplete)
• Knitr
• Rmarkdown
• TeX (if you want to knit to
PDF)
• OR
install.packages("knitr”)
Open/Create a markdown document
• Make your own Markdown
doc
• Download graph markdown
doc http://bit.ly/2p89drX
Write: useful syntax
• Plain text
• *italics* -> italics
• **bold** -> bold
• #Header -> Header (more # decreases size)
• Can also draw:
• Insert pictures
• Ordered and unordered list
• Tables
Embed code
• Inline – Use variables in
the human readable text
• `r 2 + 2`
• Code chunks - Include
working code that
generates output
• ```{r}
• #Code goes here
• ```
• Display Options –
Render
• Won’t render unless the code runs
with no errors
• You know it should be reproducible
• Render using the knit function
• Output Formats
• Knit HTML
• Knit PDF – requires TeX
• Knit Word
Exercise 3
• Open a new markdown document
• Use the formatting commands from the R Markdown cheat
sheet to create a bulleted list that looks like the content of this
slide.
• https://www.rstudio.com/wp-
content/uploads/2015/02/rmarkdown-cheatsheet.pdf
SessionInfo()
Scripts
R markdown
Hypothesis
Raw
data
Experimental
design
Tidy
Data
ResultsArticle
Cleaning
Analysis
Open Data
Code
The research cycle
Version
Control
+
Scripts
Reproducible research checklist
• Think about the entire pipeline: are all the pieces reproducible?
• Is your cleaning/analysis process automated?– guarantees reproducibility
• Are you doing things “by hand”? editing tables/figures; splitting/reformatting data
• Does your software support log files or scripts?
• If no, do you have a detailed description of your process?
• Are you using version control?
• Are you keeping track of your software?
• Computer architecture;
• OS/Software/tool/add ons (libraries/packages)/external databases
• version numbers for everything (when available)
• Are you saving the right files?: if it’s not reproducible, it’s not worth saving
• Save the data and the code
• Data + Code = Output
• Are your reports human and machine readable?
Adapted from: https://github.com/DataScienceSpecialization/courses/blob/master/05_ReproducibleResearch/Checklist/Reproducible%20Research%20Checklist.pdf
Need help?
• Email: tobin.magle@colostate.edu
• Data Management Services website:
http://lib.colostate.edu/services/data-management
• Software Carpentry Reproducible Scientific Analysis Lesson:
http://swcarpentry.github.io/r-novice-gapminder/
• Git with R tutorial: http://happygitwithr.com/
• R markdown cheatsheet: https://www.rstudio.com/wp-
content/uploads/2015/02/rmarkdown-cheatsheet.pdf

More Related Content

What's hot

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
C. Tobin Magle
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
C. Tobin Magle
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
Revolution Analytics
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked Data
Karen Estlund
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
thinrhino
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
rtelmore
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Open Knowledge Belgium
 
Unit 3
Unit 3Unit 3
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
Dancing in R
Dancing in RDancing in R
Dancing in R
Simon Roy
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
Sankhya_Analytics
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus Denker
Pharo
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
Naomi Dushay
 
OpenRefine Class Tutorial
OpenRefine Class TutorialOpenRefine Class Tutorial
OpenRefine Class TutorialAshwin Dinoriya
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
Stephen Lorello
 

What's hot (20)

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked Data
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Unit 3
Unit 3Unit 3
Unit 3
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Dancing in R
Dancing in RDancing in R
Dancing in R
 
Arakno
AraknoArakno
Arakno
 
Unit 5
Unit 5Unit 5
Unit 5
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus Denker
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
OpenRefine Class Tutorial
OpenRefine Class TutorialOpenRefine Class Tutorial
OpenRefine Class Tutorial
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
 

Similar to Reproducible research

Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
C. Tobin Magle
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
Samuel Bosch
 
1
11
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
Susan Johnston
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning process
Denis Dus
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - Intro
Mohammad Shaker
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Josh Levy-Kramer
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
Top 10 python ide
Top 10 python ideTop 10 python ide
Top 10 python ide
Saravanakumar viswanathan
 
Enter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscopeEnter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscope
Kamil Samigullin
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
tieleman
 
US23_Daniels.pptx
US23_Daniels.pptxUS23_Daniels.pptx
US23_Daniels.pptx
ELGHARBAOUIMOHAMMED
 
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Rainer Stropek
 
Architecture presentation 4
Architecture presentation 4Architecture presentation 4
Architecture presentation 4
Anoushiravan M. Ghamsari
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Alexis Seigneurin
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?
Bruno Capuano
 
Data Visualizations with ggplot2
Data Visualizations with ggplot2Data Visualizations with ggplot2
Data Visualizations with ggplot2
Ryan Harrington
 
IS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptxIS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptx
NaglaaAbdelhady
 

Similar to Reproducible research (20)

Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
1
11
1
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning process
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - Intro
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Top 10 python ide
Top 10 python ideTop 10 python ide
Top 10 python ide
 
Enter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscopeEnter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscope
 
Tech talks#6: Code Refactoring
Tech talks#6: Code RefactoringTech talks#6: Code Refactoring
Tech talks#6: Code Refactoring
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
US23_Daniels.pptx
US23_Daniels.pptxUS23_Daniels.pptx
US23_Daniels.pptx
 
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
 
Architecture presentation 4
Architecture presentation 4Architecture presentation 4
Architecture presentation 4
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?
 
Data Visualizations with ggplot2
Data Visualizations with ggplot2Data Visualizations with ggplot2
Data Visualizations with ggplot2
 
IS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptxIS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptx
 

More from C. Tobin Magle

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
C. Tobin Magle
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
C. Tobin Magle
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
C. Tobin Magle
 
Open access day
Open access dayOpen access day
Open access day
C. Tobin Magle
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
C. Tobin Magle
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
C. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
C. Tobin Magle
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
C. Tobin Magle
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
C. Tobin Magle
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
C. Tobin Magle
 

More from C. Tobin Magle (11)

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Open access day
Open access dayOpen access day
Open access day
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Reproducible research

  • 1. Reproducible research C. Tobin Magle, PhD 06-26-2017 10:00 – 11:30 a.m. Morgan Library Computer Classroom 175
  • 2. Outline • What is reproducible research? • Why would I do that? • How? (in R Studio) • Automation • Git • R Markdown
  • 3. Hypothesis Data Experimental design ResultsArticle The research cycle 1. Technological advances: • Huge, complex digital datasets • Ability to share 2. Human Error: • Poor Reporting • Flawed analyses Complications
  • 5. Reproducible research is the practice of distributing all data, software source code, and tools required to reproduce the results discussed in a research publication. https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
  • 6. Reproducible research = Data (with metadata) + Code/Software
  • 9. Replication vs. Reproducibility • Replication: Same conclusion new study (gold standard) • “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225 • Replication isn’t always feasible: too big, too costly, too time consuming, one time event, rare samples • Reproducibility: Same results from same data and code (minimum standard for validity) • “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
  • 10. Reproducibility spectrum “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
  • 11. Using markdown and version control in R Studio:
  • 12. Documenting analysis • Optimal: the instructions should be an automated script file (ie, “code”) • Minimum: Written instructions that allow for the complete reproduction of your analysis Raw Data Processed Data Results Cleaning/ processing Analysis
  • 13. Research is repetitive • Replication • Same assay, different samples • Longitudinal experiments http://xkcd.com/242/
  • 14. Doing things by hand is… • Slow • Hard to document • Hard to repeat
  • 15. Exercise 1: Making graphs • Download height graph XLS: http://bit.ly/2ooMyaO • Describe how to make a bar graph in excel
  • 16. Automation • Fast • Easy to document • Easy to repeat • Write scripts or save log files
  • 17. Example: Making graphs • Use a script: http://bit.ly/2oXOZUH
  • 18. Details to record for processing/analysis • What software was used? (R Studio, script) • Does it support log files/scripts? (yes!) • What version # and settings were used? (R version 3.3.2) • What else does the software need to run? • Computer architecture • OS/Software/tool/add ons (libraries/packages) • External databases
  • 19. In R: the sessionInfo() command
  • 21. Version Control • A system that records changes to a file or set of files over time so that you can recall specific versions later • https://git-scm.com/doc • Saves EVERYTHING • Records who made what change • Identifies conflicting changes • Not just for code
  • 22. Make an empty GitHub • Github.com
  • 23. Tell R Studio where git is *if you don’t’ know where git is installed, try • Mac terminal: which git • PC: where git.exe
  • 24. Make a new version control project *auto-recognizes version control if you select an existing directory
  • 27. Add files • Check boxes to add files to the “staging area” • Gets them ready to be added (committed) to the repository
  • 28. Staging area vs. repository
  • 29. Commit files • Click “commit” • Write a commit message • Click “commit” • Adds the changes to the repository
  • 30. Staging area vs. repository
  • 31. Exercise 2 • Make some changes to the script file in your repo • add the changes by clicking the checkbox • Create a new file – don’t click the checkbox next to it • Commit the changes • Push the changes to the remote • Does the new file appear in your remote git repository
  • 33. Push
  • 34. Literate programming = human readable (text) + machine readable (code) R markdown
  • 35. R Markdown • Open • Write • Embed • Render https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
  • 36. Install knitr and markdown packages • Tools > install packages • Enter the package name (will autocomplete) • Knitr • Rmarkdown • TeX (if you want to knit to PDF) • OR install.packages("knitr”)
  • 37. Open/Create a markdown document • Make your own Markdown doc • Download graph markdown doc http://bit.ly/2p89drX
  • 38. Write: useful syntax • Plain text • *italics* -> italics • **bold** -> bold • #Header -> Header (more # decreases size) • Can also draw: • Insert pictures • Ordered and unordered list • Tables
  • 39. Embed code • Inline – Use variables in the human readable text • `r 2 + 2` • Code chunks - Include working code that generates output • ```{r} • #Code goes here • ``` • Display Options –
  • 40. Render • Won’t render unless the code runs with no errors • You know it should be reproducible • Render using the knit function • Output Formats • Knit HTML • Knit PDF – requires TeX • Knit Word
  • 41. Exercise 3 • Open a new markdown document • Use the formatting commands from the R Markdown cheat sheet to create a bulleted list that looks like the content of this slide. • https://www.rstudio.com/wp- content/uploads/2015/02/rmarkdown-cheatsheet.pdf
  • 43. Reproducible research checklist • Think about the entire pipeline: are all the pieces reproducible? • Is your cleaning/analysis process automated?– guarantees reproducibility • Are you doing things “by hand”? editing tables/figures; splitting/reformatting data • Does your software support log files or scripts? • If no, do you have a detailed description of your process? • Are you using version control? • Are you keeping track of your software? • Computer architecture; • OS/Software/tool/add ons (libraries/packages)/external databases • version numbers for everything (when available) • Are you saving the right files?: if it’s not reproducible, it’s not worth saving • Save the data and the code • Data + Code = Output • Are your reports human and machine readable? Adapted from: https://github.com/DataScienceSpecialization/courses/blob/master/05_ReproducibleResearch/Checklist/Reproducible%20Research%20Checklist.pdf
  • 44. Need help? • Email: tobin.magle@colostate.edu • Data Management Services website: http://lib.colostate.edu/services/data-management • Software Carpentry Reproducible Scientific Analysis Lesson: http://swcarpentry.github.io/r-novice-gapminder/ • Git with R tutorial: http://happygitwithr.com/ • R markdown cheatsheet: https://www.rstudio.com/wp- content/uploads/2015/02/rmarkdown-cheatsheet.pdf