SlideShare a Scribd company logo
1 of 25
Reproducible computational
research in R
An introduction by Samuel Bosch (October 2015)
http://samuelbosch.com
Topics
– Introduction
– Version control (Git)
– Reproducible analysis in R
• Writing packages
• R Markdown
• Saving plots
• Saving data
• Packrat
Reproducible (computational) research
1. For Every Result, Keep Track of How It Was Produced
– Steps, commands, clicks
2. Avoid Manual Data Manipulation Steps
3. Archive the Exact Versions of All External Programs Used
– Packrat (Reproducible package management for R)
4. Version Control All Custom Scripts
5. Record All Intermediate Results, When Possible in Standardized Formats
6. For Analyses That Include Randomness, Note Underlying Random Seeds
– set.seed(42)
7. Always Store Raw Data behind Plots
8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
9. Connect Textual Statements to Underlying Results
10. Provide Public Access to Scripts, Runs, and Results
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational
Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Version control
• Word review on steroids
• When working alone: it’s a database of all the versions of
your files
• When collaborating: it’s a database of all the versions of all
collaborators with one master version where all changes can
be merged into.
• When there are no conflicts then merging can be done
automatically.
• Multiple programs/protocols: git, mercurial, svn, …
• By default not for versioning large files (> 50 mb) but there is
a Git Large File Storage extension
• Works best with text files (code, markdown, csv, …)
Git
• Popularized by http://github.com but
supported by different providers
(http://github.ugent.be, http://bitbucket.org).
• Programs for Git on windows:
– Standard Git Gui + command line (git-scm.com)
– GitHub Desktop for Windows
– Atlassian SourceTree
Git workflow (1 user)
Workflow:
1. create a repository on your preferred provider
If you want a private repository then use bitbucket.org or apply for
the student developer pack (https://education.github.com/)
2. Clone the repository to your computer
git clone https://github.com/samuelbosch/sdmpredictors.git
3. Make changes
4. View changes (optional)
git status
5. Submit changes
git add
git commit -am “”
git push
Git extras to explore
• Excluding files from Git with .gitignore
• Contributing to open source
– Forking
– Pull requests
DEMO
• New project on https://github.ugent.be/
• Clone
• Add file
• Status
• Commit
• Edit file
• Commit
• Push
R general
• Use Rstudio
https://www.rstudio.com/products/rstudio/down
load/ and explore it
– Projects
– Keyboard shortcuts
– Git integration
– Package development
– R markdown
• R Short Reference Card: https://cran.r-
project.org/doc/contrib/Short-refcard.pdf
• Style guide: http://adv-r.had.co.nz/Style.html
R package development
• R packages by Hadley Wickham (http://r-
pkgs.had.co.nz/)
• Advantages:
– Can be shared easily
– One package with your data and your code
– Documentation (if you write it)
– Ease of testing
R packages: Getting started
• install.packages(“devtools”)
• Rstudio -> new project -> new directory -> R
package
• # Build and Reload Package: 'Ctrl + Shift + B'
• # Check Package: 'Ctrl + Shift + E'
• # Test Package: 'Ctrl + Shift + T'
• # Build documentation: 'Ctrl + Shift + D'
R packages: testing
• Test if your functions returns the expected results
• Gives confidence in the correctness of your code, especially when
changing things
• http://r-pkgs.had.co.nz/tests.html
devtools::use_testthat()
library(stringr)
context("String length")
test_that("str_length is number of characters", {
expect_equal(str_length("a"), 1)
expect_equal(str_length("ab"), 2)
expect_equal(str_length("abc"), 3)
})
R Markdown
• Easy creation of dynamic documents
– Mix of R and markdown
– Output to word, html or pdf
– Integrates nicely with version control as
markdown is a text format (easy to diff)
• Rstudio: New file -> R Markdown
• Powered by knitr (alternative to Sweave)
R Markdown: example
---
title: "Numbers and their values"
output:
word_document:
fig_caption: yes
---
```{r, echo=FALSE, warning=FALSE, message=FALSE}
# R code block that won’t appear in the output document
three <- 1+2
```
# Chapter 1: On the value of 1 and 2
It is a well known fact that 1 and 2 = `r three`, you can calculate this also inline `r 1+2`.
Or show the entire calculation:
```{r}
1+2
```
Markdown basics
Headers
# Heading level 1
## Heading level 2
###### Heading level 6
*italic* and is _this is also italic_
**bold** and __this is also bold__
*, + or - for (unordered) list items (bullets)
1., 2., …. for ordered list
This is an [example link](http://example.com/).
Image here: ![alt text](/path/to/img.jpg)
Bibtex references: [@RCoreTeam2014; @Wand2014] but needs a link
to a bibtex file in the header bibliography: bibliography.bib
More at: http://daringfireball.net/projects/markdown/basics
Used at other places : github, stackoverflow, … but sometimes a dialect
Caching intermediate results
Official way: http://yihui.name/knitr/demo/cache/
Hand rolled (more explicit, but doesn’t clean up previous versions and hard coded
cache directory):
library(digest)
make_or_load <- function(change_path, file_prefix, make_fn, force_make = FALSE) {
changeid <- as.integer(file.info(change_path)$mtime)
fn_md5 <- digest(capture.output(make_fn), algo = "md5", serialize = F)
path <- paste0("D:/temp/", file_prefix, changeid, "_", fn_md5, ".RData")
if(!file.exists(path) || force_make) {
result <- make_fn()
save(result, file = path)
}
else {
result <- get(load(path))
}
return(result)
}
df <- make_or_load(wb, "invasives_df_area_", function() { set_area(df) })
Saving plots
save_plot <- function(filename, plotfn, outdir = "D:/temp/", ...) {
height<-498
width<-662
invisible(capture.output(tryCatch({
plotfn(...)
op <- par(mar=c(2.2,4.1,1,1)+0.1)
on.exit(op)
jpeg(filename=paste0(outdir, filename ,".jpeg"), width=width, height=height, pointsize=12, quality=100)
plotfn(...)
dev.off()
par(mar=c(5, 4, 4, 2) + 0.1) # default values
svg(filename=paste0(outdir, filename,".svg"), width=14, height=7, pointsize=12,onefile=TRUE)
plotfn(...)
dev.off()
}, error = function(e) { print(e)
}, finally = {
while(dev.cur() > 2) dev.off()
})))
}
set.seed(42)
save_plot("plothist", hist, x=sample(c(1:5,3:4), 100, replace = TRUE),
xlab = "Random", ylab = "Density", freq = FALSE, breaks=1:5)
Saving tables
• As html
stargazer(data, type = "html", summary = FALSE, out
= outputpath , out.header = T)
• As csv
write.csv2(data, file = outputpath)
data <- read.csv2(outputpath)
• As Rdata
save(data, file = outputpath)
data <- load(outputpath)
Packrat
Use packrat to make your R projects more:
• Isolated: Installing a new or updated package for one
project won’t break your other projects, and vice versa.
That’s because packrat gives each project its own private
package library.
• Portable: Easily transport your projects from one computer
to another, even across different platforms. Packrat makes
it easy to install the packages your project depends on.
• Reproducible: Packrat records the exact package versions
you depend on, and ensures those exact versions are the
ones that get installed wherever you go.
Packrat
Rstudio:
Project support for Packrat on creation of a project or it can be
enabled in the project settings
Manually:
install.packages("packrat")
# intialize packrat in an project directory
packrat::init("D:/temp/demo_packrat")
# install a package
install.packages("raster")
# save the changes in Packrat (by default auto-snapshot
packrat::snapshot()
# view list of packages that might be missing or that can be
removed
packrat::status()
DEMO
• Package development (new, existing)
• Rmarkdown (new, existing)
• Packrat (new and existing project)
– packrat::init()
Learning More
https://software-carpentry.org/
Lessons on using the (Linux) shell, Git, Mercurial,
Databases & SQL, Python, R, Matlab and
automation with Make
R packages by Hadley Wickham
Advanced R by Hadley Wickham

More Related Content

What's hot

R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementEelco Visser
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performancesource{d}
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Power to the People: Redis Lua Scripts
Power to the People: Redis Lua ScriptsPower to the People: Redis Lua Scripts
Power to the People: Redis Lua ScriptsItamar Haber
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage CollectionEelco Visser
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekingeProf. Wim Van Criekinge
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
Postgresql Database Administration- Day4
Postgresql Database Administration- Day4Postgresql Database Administration- Day4
Postgresql Database Administration- Day4PoguttuezhiniVP
 
Filelist
FilelistFilelist
FilelistNeelBca
 

What's hot (17)

R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory Management
 
Big Data Analytics Lab File
Big Data Analytics Lab FileBig Data Analytics Lab File
Big Data Analytics Lab File
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Power to the People: Redis Lua Scripts
Power to the People: Redis Lua ScriptsPower to the People: Redis Lua Scripts
Power to the People: Redis Lua Scripts
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage Collection
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Using HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py moduleUsing HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py module
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Postgresql Database Administration- Day4
Postgresql Database Administration- Day4Postgresql Database Administration- Day4
Postgresql Database Administration- Day4
 
Filelist
FilelistFilelist
Filelist
 

Similar to Reproducible Computational Research in R

PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MoreMatt Harrison
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Reproducibility with R
Reproducibility with RReproducibility with R
Reproducibility with RMartin Jung
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoRodolfo Carvalho
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layerKiyoto Tamura
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and toolsC. Tobin Magle
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance ComputersDave Hiltbrand
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with pythonroskakori
 
R package development, create package documentation isabella gollini
R package development, create package documentation   isabella golliniR package development, create package documentation   isabella gollini
R package development, create package documentation isabella golliniDataFest Tbilisi
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 

Similar to Reproducible Computational Research in R (20)

R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0
 
Golang
GolangGolang
Golang
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and More
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Reproducibility with R
Reproducibility with RReproducibility with R
Reproducibility with R
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX Go
 
Reproducible research
Reproducible researchReproducible research
Reproducible research
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and tools
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
 
R package development, create package documentation isabella gollini
R package development, create package documentation   isabella golliniR package development, create package documentation   isabella gollini
R package development, create package documentation isabella gollini
 
R Introduction
R IntroductionR Introduction
R Introduction
 
R - the language
R - the languageR - the language
R - the language
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 

Reproducible Computational Research in R

  • 1. Reproducible computational research in R An introduction by Samuel Bosch (October 2015) http://samuelbosch.com
  • 2. Topics – Introduction – Version control (Git) – Reproducible analysis in R • Writing packages • R Markdown • Saving plots • Saving data • Packrat
  • 3. Reproducible (computational) research 1. For Every Result, Keep Track of How It Was Produced – Steps, commands, clicks 2. Avoid Manual Data Manipulation Steps 3. Archive the Exact Versions of All External Programs Used – Packrat (Reproducible package management for R) 4. Version Control All Custom Scripts 5. Record All Intermediate Results, When Possible in Standardized Formats 6. For Analyses That Include Randomness, Note Underlying Random Seeds – set.seed(42) 7. Always Store Raw Data behind Plots 8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected 9. Connect Textual Statements to Underlying Results 10. Provide Public Access to Scripts, Runs, and Results Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
  • 4.
  • 5.
  • 6. Version control • Word review on steroids • When working alone: it’s a database of all the versions of your files • When collaborating: it’s a database of all the versions of all collaborators with one master version where all changes can be merged into. • When there are no conflicts then merging can be done automatically. • Multiple programs/protocols: git, mercurial, svn, … • By default not for versioning large files (> 50 mb) but there is a Git Large File Storage extension • Works best with text files (code, markdown, csv, …)
  • 7. Git • Popularized by http://github.com but supported by different providers (http://github.ugent.be, http://bitbucket.org). • Programs for Git on windows: – Standard Git Gui + command line (git-scm.com) – GitHub Desktop for Windows – Atlassian SourceTree
  • 8. Git workflow (1 user) Workflow: 1. create a repository on your preferred provider If you want a private repository then use bitbucket.org or apply for the student developer pack (https://education.github.com/) 2. Clone the repository to your computer git clone https://github.com/samuelbosch/sdmpredictors.git 3. Make changes 4. View changes (optional) git status 5. Submit changes git add git commit -am “” git push
  • 9. Git extras to explore • Excluding files from Git with .gitignore • Contributing to open source – Forking – Pull requests
  • 10. DEMO • New project on https://github.ugent.be/ • Clone • Add file • Status • Commit • Edit file • Commit • Push
  • 11. R general • Use Rstudio https://www.rstudio.com/products/rstudio/down load/ and explore it – Projects – Keyboard shortcuts – Git integration – Package development – R markdown • R Short Reference Card: https://cran.r- project.org/doc/contrib/Short-refcard.pdf • Style guide: http://adv-r.had.co.nz/Style.html
  • 12. R package development • R packages by Hadley Wickham (http://r- pkgs.had.co.nz/) • Advantages: – Can be shared easily – One package with your data and your code – Documentation (if you write it) – Ease of testing
  • 13. R packages: Getting started • install.packages(“devtools”) • Rstudio -> new project -> new directory -> R package • # Build and Reload Package: 'Ctrl + Shift + B' • # Check Package: 'Ctrl + Shift + E' • # Test Package: 'Ctrl + Shift + T' • # Build documentation: 'Ctrl + Shift + D'
  • 14. R packages: testing • Test if your functions returns the expected results • Gives confidence in the correctness of your code, especially when changing things • http://r-pkgs.had.co.nz/tests.html devtools::use_testthat() library(stringr) context("String length") test_that("str_length is number of characters", { expect_equal(str_length("a"), 1) expect_equal(str_length("ab"), 2) expect_equal(str_length("abc"), 3) })
  • 15. R Markdown • Easy creation of dynamic documents – Mix of R and markdown – Output to word, html or pdf – Integrates nicely with version control as markdown is a text format (easy to diff) • Rstudio: New file -> R Markdown • Powered by knitr (alternative to Sweave)
  • 16. R Markdown: example --- title: "Numbers and their values" output: word_document: fig_caption: yes --- ```{r, echo=FALSE, warning=FALSE, message=FALSE} # R code block that won’t appear in the output document three <- 1+2 ``` # Chapter 1: On the value of 1 and 2 It is a well known fact that 1 and 2 = `r three`, you can calculate this also inline `r 1+2`. Or show the entire calculation: ```{r} 1+2 ```
  • 17. Markdown basics Headers # Heading level 1 ## Heading level 2 ###### Heading level 6 *italic* and is _this is also italic_ **bold** and __this is also bold__ *, + or - for (unordered) list items (bullets) 1., 2., …. for ordered list This is an [example link](http://example.com/). Image here: ![alt text](/path/to/img.jpg) Bibtex references: [@RCoreTeam2014; @Wand2014] but needs a link to a bibtex file in the header bibliography: bibliography.bib More at: http://daringfireball.net/projects/markdown/basics Used at other places : github, stackoverflow, … but sometimes a dialect
  • 18. Caching intermediate results Official way: http://yihui.name/knitr/demo/cache/ Hand rolled (more explicit, but doesn’t clean up previous versions and hard coded cache directory): library(digest) make_or_load <- function(change_path, file_prefix, make_fn, force_make = FALSE) { changeid <- as.integer(file.info(change_path)$mtime) fn_md5 <- digest(capture.output(make_fn), algo = "md5", serialize = F) path <- paste0("D:/temp/", file_prefix, changeid, "_", fn_md5, ".RData") if(!file.exists(path) || force_make) { result <- make_fn() save(result, file = path) } else { result <- get(load(path)) } return(result) } df <- make_or_load(wb, "invasives_df_area_", function() { set_area(df) })
  • 19. Saving plots save_plot <- function(filename, plotfn, outdir = "D:/temp/", ...) { height<-498 width<-662 invisible(capture.output(tryCatch({ plotfn(...) op <- par(mar=c(2.2,4.1,1,1)+0.1) on.exit(op) jpeg(filename=paste0(outdir, filename ,".jpeg"), width=width, height=height, pointsize=12, quality=100) plotfn(...) dev.off() par(mar=c(5, 4, 4, 2) + 0.1) # default values svg(filename=paste0(outdir, filename,".svg"), width=14, height=7, pointsize=12,onefile=TRUE) plotfn(...) dev.off() }, error = function(e) { print(e) }, finally = { while(dev.cur() > 2) dev.off() }))) } set.seed(42) save_plot("plothist", hist, x=sample(c(1:5,3:4), 100, replace = TRUE), xlab = "Random", ylab = "Density", freq = FALSE, breaks=1:5)
  • 20.
  • 21. Saving tables • As html stargazer(data, type = "html", summary = FALSE, out = outputpath , out.header = T) • As csv write.csv2(data, file = outputpath) data <- read.csv2(outputpath) • As Rdata save(data, file = outputpath) data <- load(outputpath)
  • 22. Packrat Use packrat to make your R projects more: • Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library. • Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on. • Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
  • 23. Packrat Rstudio: Project support for Packrat on creation of a project or it can be enabled in the project settings Manually: install.packages("packrat") # intialize packrat in an project directory packrat::init("D:/temp/demo_packrat") # install a package install.packages("raster") # save the changes in Packrat (by default auto-snapshot packrat::snapshot() # view list of packages that might be missing or that can be removed packrat::status()
  • 24. DEMO • Package development (new, existing) • Rmarkdown (new, existing) • Packrat (new and existing project) – packrat::init()
  • 25. Learning More https://software-carpentry.org/ Lessons on using the (Linux) shell, Git, Mercurial, Databases & SQL, Python, R, Matlab and automation with Make R packages by Hadley Wickham Advanced R by Hadley Wickham