SlideShare a Scribd company logo
Introduction to Reproducible Research
in R and R Studio.
Susan Johnston
April 1, 2016
What is Reproducible Research?
Reproducibility is the ability of an entire experiment or
study to be reproduced, either by the researcher or by
someone else working independently, [and] is one of the
main principles of the scientific method.
Wikipedia
In the lab:
Many of us are clicking, copying and pasting...
Can you repeat all of this again. . .
. . . and would you get the same results every time?
Worst Case Scenario
Scenarios that benefit from reproducibility
New raw data becomes available.
You return to the project after a period of time.
Project gets handed to new PhD student/postdoc.
Working collaboratively.
A reviewer wants you to change a model parameter.
When you find an error, but not sure where you went wrong.
Four rules for reproducibility.
1. Create a portable project.
2. Avoid manual data manipulation steps - use code!
3. Connect results to text.
4. Version control all custom scripts and documents.
Disclaimer
Many solutions to the same problem!
The Environment: http://www.rstudio.com
Reproducible Research in
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
Structuring an R Project.
http://nicercode.github.io/blog/2013-05-17-organising-my-project/
Structuring an R Project.
http://nicercode.github.io/blog/2013-05-17-organising-my-project/
All data, scripts and output should be kept within the same project
directory.
Structuring an R Project.
http://nicercode.github.io/blog/2013-04-05-projects/
R/ Contains functions relevant to analysis.
data/ Contains raw data as read only.
doc/ Contains the paper.
figs/ Contains the figures.
output/ Contains analysis output
(processed data, logs, etc. Treat as disposable).
.R Code for the analysis.
Structuring an R Project.
http://robjhyndman.com/hyndsight/workflow-in-r/
load.R - read in data from files
clean.R - pre-processing and cleaning
functions.R - define what you need for anlaysis
do.R - do the analysis!
Bad habits can hinder portability.
https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
setwd("C:/Users/susjoh/Desktop/SalmoAnalysis")
setwd("C:/Users/Susan Johnston/Desktop/SalmoAnalysis")
setwd("C:/Users/sjohns10/Drive/SalmoAnalysis")
source("../../OvisAnalysis/GWASplotfunctions.R")
An analysis should be contained within a directory, and it should
be easy to move it or pass on to someone new.
Solution: using Projects.
https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
Establishes a directory with associated .Rproj file.
Automatically sets the working directory.
Can save and source .Rprofile, .Rhistory, .Rdata files.
Allows version control within R Studio.
Creating a Portable Project (.Rproj)
Creating a Portable Project (.Rproj)
Creating a Portable Project (.Rproj)
Creating a Portable Project (.Rproj)
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
This is R. There is no “if”. Only “how”.
CRAN, Bioconductor, github
Reading in data and functions
read.table(), read.csv(), read.xlsx(), source()
Reorganising data
reshape, plyr, dplyr
Generate figures
plot(), library(ggplot2)
Running external programmes with system()
Unix/Mac: system("plink -file OvGen --freq")
Windows: system("cmd", input = "plink -file OvGen --freq")
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
The knitr package allows R code and document templates to be
compiled into a single report containing text, results and figures.
Output script as Notebook
Write reports directly in R
Write reports directly in R
Creating an R Markdown Script (.Rmd).
Creating an R Markdown Script (.Rmd).
A Quick Start Guide
http://nicercode.github.io/guides/reports/
1. Type report text into .Rmd file
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
2. Enclose code to be evaluated in chunks
```{r}
model1 <- lm(speed ~ dist, data = cars)
```
3. Evaluate code inline
The slope of the model is `r coefficients(model1)[2]`
The slope of the model is 0.16557
4. Compile report as .html, .pdf or .doc
A Quick Start Guide
http://nicercode.github.io/guides/reports/
NB. PDF and Word docs require additional software.
http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop
A Quick Start Guide
http://nicercode.github.io/guides/reports/
http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop
Advanced Tips
Control how chunks are reported and evaluated
```{r echo = F, warning = F, fig.width = 3}
model1 <- lm(speed ~ dist, data = cars)
plot(model1)
```
spin(): compile .R files using #’, #+ and #-
http://deanattali.com/2015/03/24/knitrs-best-hidden-gem-spin/
LATEXdocuments, Presentations, Shiny, etc.
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
Version Control can revert a document to a previous
version.
Version Control can revert a document to a previous
version.
Version Control can revert a document to a previous
version.
Version Control can revert a document to a previous
version.
Version Control Using git.
https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN
Git can be installed on all platforms, and can be used to implement
version control within an R Studio Project.
http://git-scm.com/downloads
Version Control in R Studio
Tools >Project Options allows setup of git version control.
Version Control in R Studio
Select git as a version control system
Version Control in R Studio
Select git as a version control system
Version Control in R Studio
git information will appear in the top-right frame.
Version Control in R Studio
git information will appear in the top-right frame.
Version Control in R Studio
Select files to version control, write a meaningful commit message
>Commit
Version Control in R Studio
Select files to version control, write a meaningful commit message
>Commit
Version Control in R Studio
After modifying the file, repeat the process.
Version Control in R Studio
After modifying the file, repeat the process.
Version Control in R Studio
Previous versions can be viewed and restored from the History tab.
Version Control in R Studio
Previous versions can be viewed and restored from the History tab.
Advanced Steps: Github
Forking projects
All scripts are backed up online
Facilitates collaboration and working on different computers
Take home messages
Manage projects reproducibly: The first researcher who will
need to reproduce the results is likely to be YOU.
Time invested in learning to code pays off - do it.
Supervisors should be patient and encourage students to code.
Online Resources
RStudio: Idiot-proof guides and cheat sheets
http://www.rstudio.com/
Nice R Code: How-tos and advice on good coding practice
http://nicercode.github.io/guide.html
Ten Simple Rules for Reproducible Computational Research
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
Yihui Xie’s blog (knitr) http://yihui.name/en/categories/
R Bloggers: http://www.r-bloggers.com/
StackOverflow questions on R and knitr
http://stackoverflow.com/questions/tagged/r+knitr

More Related Content

What's hot

R studio practical file
R studio  practical file R studio  practical file
R studio practical file
Ketan Khaira
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
PyData
 
Zelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalysesZelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalysesWagner M. S. Sampaio
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
Anand Sampat
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Preetha Chatterjee
 

What's hot (6)

R studio practical file
R studio  practical file R studio  practical file
R studio practical file
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
Zelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalysesZelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalyses
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
 

Viewers also liked

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
Ram Narasimhan
 
R-Studio, diferencia estadísticamente significativa 1
R-Studio, diferencia estadísticamente significativa 1R-Studio, diferencia estadísticamente significativa 1
R-Studio, diferencia estadísticamente significativa 1
Jose Quezada
 
Setup R and R Studio
Setup R and R StudioSetup R and R Studio
Setup R and R Studio
Robin Donatello
 
Manual r commander By Juan Guarangaa
Manual r commander By Juan GuarangaaManual r commander By Juan Guarangaa
Manual r commander By Juan Guarangaa
Juan Ete
 
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
Penn State University
 
20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料
安隆 沖
 
Rlecturenotes
RlecturenotesRlecturenotes
Rlecturenotes
thecar1992
 
Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)
Vladimir Gutierrez, PhD
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
Muhammad Nabi Ahmad
 
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RPaquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Nestor Montaño
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Fan Li
 
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
Penn State University
 
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
Penn State University
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
Muhammad Nabi Ahmad
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
Paul Richards
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
C. Tobin Magle
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
Edwin de Jonge
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
Florian Uhlitz
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
arttan2001
 
Tokyor36
Tokyor36Tokyor36

Viewers also liked (20)

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
R-Studio, diferencia estadísticamente significativa 1
R-Studio, diferencia estadísticamente significativa 1R-Studio, diferencia estadísticamente significativa 1
R-Studio, diferencia estadísticamente significativa 1
 
Setup R and R Studio
Setup R and R StudioSetup R and R Studio
Setup R and R Studio
 
Manual r commander By Juan Guarangaa
Manual r commander By Juan GuarangaaManual r commander By Juan Guarangaa
Manual r commander By Juan Guarangaa
 
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
 
20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料
 
Rlecturenotes
RlecturenotesRlecturenotes
Rlecturenotes
 
Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RPaquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en R
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)
 
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
 
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
 
Tokyor36
Tokyor36Tokyor36
Tokyor36
 

Similar to Reproducible Research in R and R Studio

20150422 repro resr
20150422 repro resr20150422 repro resr
20150422 repro resr
Susan Johnston
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
Felipe Prado
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
Olga Scrivner
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
Rupak Roy
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
gslicraf
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
C. Tobin Magle
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in R
liz__is
 
R tutorial
R tutorialR tutorial
.NET Recommended Resources
.NET Recommended Resources.NET Recommended Resources
.NET Recommended Resources
Greg Sohl
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
Mattupallipardhu
 
Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
C. Tobin Magle
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
HaritikaChhatwal1
 
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Charles Guedenet
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
IRJET Journal
 
Reproducible research
Reproducible researchReproducible research
Reproducible research
C. Tobin Magle
 
Why your build matters
Why your build mattersWhy your build matters
Why your build matters
Peter Ledbrook
 
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics PipelineWhat We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
Work-Bench
 

Similar to Reproducible Research in R and R Studio (20)

20150422 repro resr
20150422 repro resr20150422 repro resr
20150422 repro resr
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in R
 
R tutorial
R tutorialR tutorial
R tutorial
 
.NET Recommended Resources
.NET Recommended Resources.NET Recommended Resources
.NET Recommended Resources
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
 
Django by rj
Django by rjDjango by rj
Django by rj
 
Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
 
Reproducible research
Reproducible researchReproducible research
Reproducible research
 
Why your build matters
Why your build mattersWhy your build matters
Why your build matters
 
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics PipelineWhat We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 

Reproducible Research in R and R Studio

  • 1. Introduction to Reproducible Research in R and R Studio. Susan Johnston April 1, 2016
  • 2. What is Reproducible Research? Reproducibility is the ability of an entire experiment or study to be reproduced, either by the researcher or by someone else working independently, [and] is one of the main principles of the scientific method. Wikipedia
  • 4. Many of us are clicking, copying and pasting... Can you repeat all of this again. . . . . . and would you get the same results every time?
  • 6. Scenarios that benefit from reproducibility New raw data becomes available. You return to the project after a period of time. Project gets handed to new PhD student/postdoc. Working collaboratively. A reviewer wants you to change a model parameter. When you find an error, but not sure where you went wrong.
  • 7. Four rules for reproducibility. 1. Create a portable project. 2. Avoid manual data manipulation steps - use code! 3. Connect results to text. 4. Version control all custom scripts and documents.
  • 8. Disclaimer Many solutions to the same problem!
  • 10. Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses - stop clicking and start typing. 3. Dynamic report writing with R Markdown and knitr 4. Version control using git
  • 11. Reproducible Research in . 1. Creating a Portable Project (.Rproj) 2. Automate analyses - stop clicking and start typing. 3. Dynamic report writing with R Markdown and knitr 4. Version control using git
  • 12. Structuring an R Project. http://nicercode.github.io/blog/2013-05-17-organising-my-project/
  • 13. Structuring an R Project. http://nicercode.github.io/blog/2013-05-17-organising-my-project/ All data, scripts and output should be kept within the same project directory.
  • 14. Structuring an R Project. http://nicercode.github.io/blog/2013-04-05-projects/ R/ Contains functions relevant to analysis. data/ Contains raw data as read only. doc/ Contains the paper. figs/ Contains the figures. output/ Contains analysis output (processed data, logs, etc. Treat as disposable). .R Code for the analysis.
  • 15. Structuring an R Project. http://robjhyndman.com/hyndsight/workflow-in-r/ load.R - read in data from files clean.R - pre-processing and cleaning functions.R - define what you need for anlaysis do.R - do the analysis!
  • 16. Bad habits can hinder portability. https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects setwd("C:/Users/susjoh/Desktop/SalmoAnalysis") setwd("C:/Users/Susan Johnston/Desktop/SalmoAnalysis") setwd("C:/Users/sjohns10/Drive/SalmoAnalysis") source("../../OvisAnalysis/GWASplotfunctions.R") An analysis should be contained within a directory, and it should be easy to move it or pass on to someone new.
  • 17. Solution: using Projects. https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects Establishes a directory with associated .Rproj file. Automatically sets the working directory. Can save and source .Rprofile, .Rhistory, .Rdata files. Allows version control within R Studio.
  • 18. Creating a Portable Project (.Rproj)
  • 19. Creating a Portable Project (.Rproj)
  • 20. Creating a Portable Project (.Rproj)
  • 21. Creating a Portable Project (.Rproj)
  • 22. Reproducible Research in . 1. Creating a Portable Project (.Rproj) 2. Automate analyses - stop clicking and start typing. 3. Dynamic report writing with R Markdown and knitr 4. Version control using git
  • 23. This is R. There is no “if”. Only “how”. CRAN, Bioconductor, github Reading in data and functions read.table(), read.csv(), read.xlsx(), source() Reorganising data reshape, plyr, dplyr Generate figures plot(), library(ggplot2) Running external programmes with system() Unix/Mac: system("plink -file OvGen --freq") Windows: system("cmd", input = "plink -file OvGen --freq")
  • 24. Reproducible Research in . 1. Creating a Portable Project (.Rproj) 2. Automate analyses - stop clicking and start typing. 3. Dynamic report writing with R Markdown and knitr 4. Version control using git
  • 25. The knitr package allows R code and document templates to be compiled into a single report containing text, results and figures.
  • 26. Output script as Notebook
  • 27.
  • 30. Creating an R Markdown Script (.Rmd).
  • 31. Creating an R Markdown Script (.Rmd).
  • 32. A Quick Start Guide http://nicercode.github.io/guides/reports/ 1. Type report text into .Rmd file Lorem ipsum dolor sit amet, consectetuer adipiscing elit. 2. Enclose code to be evaluated in chunks ```{r} model1 <- lm(speed ~ dist, data = cars) ``` 3. Evaluate code inline The slope of the model is `r coefficients(model1)[2]` The slope of the model is 0.16557 4. Compile report as .html, .pdf or .doc
  • 33. A Quick Start Guide http://nicercode.github.io/guides/reports/ NB. PDF and Word docs require additional software. http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop
  • 34. A Quick Start Guide http://nicercode.github.io/guides/reports/ http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop
  • 35. Advanced Tips Control how chunks are reported and evaluated ```{r echo = F, warning = F, fig.width = 3} model1 <- lm(speed ~ dist, data = cars) plot(model1) ``` spin(): compile .R files using #’, #+ and #- http://deanattali.com/2015/03/24/knitrs-best-hidden-gem-spin/ LATEXdocuments, Presentations, Shiny, etc.
  • 36. Reproducible Research in . 1. Creating a Portable Project (.Rproj) 2. Automate analyses - stop clicking and start typing. 3. Dynamic report writing with R Markdown and knitr 4. Version control using git
  • 37. Version Control can revert a document to a previous version.
  • 38. Version Control can revert a document to a previous version.
  • 39. Version Control can revert a document to a previous version.
  • 40. Version Control can revert a document to a previous version.
  • 41. Version Control Using git. https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN Git can be installed on all platforms, and can be used to implement version control within an R Studio Project. http://git-scm.com/downloads
  • 42. Version Control in R Studio Tools >Project Options allows setup of git version control.
  • 43. Version Control in R Studio Select git as a version control system
  • 44. Version Control in R Studio Select git as a version control system
  • 45. Version Control in R Studio git information will appear in the top-right frame.
  • 46. Version Control in R Studio git information will appear in the top-right frame.
  • 47. Version Control in R Studio Select files to version control, write a meaningful commit message >Commit
  • 48. Version Control in R Studio Select files to version control, write a meaningful commit message >Commit
  • 49. Version Control in R Studio After modifying the file, repeat the process.
  • 50. Version Control in R Studio After modifying the file, repeat the process.
  • 51. Version Control in R Studio Previous versions can be viewed and restored from the History tab.
  • 52. Version Control in R Studio Previous versions can be viewed and restored from the History tab.
  • 53. Advanced Steps: Github Forking projects All scripts are backed up online Facilitates collaboration and working on different computers
  • 54. Take home messages Manage projects reproducibly: The first researcher who will need to reproduce the results is likely to be YOU. Time invested in learning to code pays off - do it. Supervisors should be patient and encourage students to code.
  • 55. Online Resources RStudio: Idiot-proof guides and cheat sheets http://www.rstudio.com/ Nice R Code: How-tos and advice on good coding practice http://nicercode.github.io/guide.html Ten Simple Rules for Reproducible Computational Research http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285 Yihui Xie’s blog (knitr) http://yihui.name/en/categories/ R Bloggers: http://www.r-bloggers.com/ StackOverflow questions on R and knitr http://stackoverflow.com/questions/tagged/r+knitr