R, Git, Github, and 
CI 
TTaaiiwwaann RR UUsseerr GGrroouupp 
WWuusshh WWuu 
22001144--0099--2200
DSC 2014 
● 2014 is the first year of DSC(Data Science 
Conference) in Taiwan. 
● We (Taiwan R User Group) organizes the Tutorial 
Program of R in DSC. 
● There were more than 100 students joined us during 
DSC 2014. 
● The averaged rating is more than 4.2 (1 ~ 5).
Goal of Tutorial 
● Systematically introduce the analysis step with R 
– Basic 
– Data Manipulation(Extract, Transform and Loading) 
– Analysis 
– Visualization 
● Based on the latest tools of R 
● Reproducibility of examples 
● Integration of materials 
● *Well designed exercises
About Me 
● PhD Candidate in NTU EE 
● Current research field: 
– Online Advertisement 
– Large Scale Predictive Modeling 
● Organizer of Taiwan R User Group 
● Organizer of Tutorial Program in DSC 2014
Outline 
● Share the experience of organizing tutorial program 
with 16 people with: 
– Git, my favorite tool of version control 
– Github, a platform of cooperation 
– Jenkins, a system of automation 
● I will show how to cooperate these tools with R 
package
Why R Package 
● There are many dependency for examples and exercises 
● R package is the recommended way to share your code 
● Wrap all materials in one R Package: DSC2014Tutorial so the 
students only need to download once. 
– All slides are included. 
– Customized R API 
– All data 
– *Installation of depended packages 
– Solving issue of portability(Windows, Mac, and Ubuntu) 
● The package is easily managed by git and released on github
The structure of R package 
Dependencies 
● DESCRIPTION 
Package: DSC2014Tutorial 
Type: Package 
Title: Materials of Tutorial Program on 
DSC 2014 
Version: 1.2 
Date: 2014-08-03 
Author: Taiwan R User Group 
Maintainer: Wush Wu <wush978@gmail.com> 
Description: This package contains the 
required materials of R Tutorial 
DSC2014 
License: GPL (>= 3) 
Depends: 
R (>= 3.1.0) 
Imports: 
tools, 
...
The structure of R package 
Data 
● data 
data(salary, package = 'DSC2014Tutorial')
The structure of R package 
cross-platform 
● configure.ac / configure
The structure of R package 
slides and external source 
system.file('Basic', package = 
'DSC2014Tutorial')
Git, Version Control 
● Some speakers are new to git 
● We used the following feature: 
– Self version control: add, commit 
– Repository: remote, push, pull, and merge 
– Cooperation: submodul 
● Git plays the fundamental role in our workflow
Why Git? 
● Speed is king 
● Local commits rock 
● Github 
●My favorite
Github 
● Most popular platform for managing git 
repository 
● Provide many convenient features 
– Account of Organization 
– Designed for cooperation 
– Simple integration with many popular CI tools 
– Static website (Sufficient for R Repository)
Release R Package on Github 
● R is released as: 
– a git repository 
– a R repository
Github and R Repository 
● How to establish a R repository on github: 
1.Create a new git repository named 『R』 
2.Add the content of R repository into git repository in 
branch gghh--ppaaggeess 
3. Push and wait 
4. The R Repository is located at http://<account>.github.io/R 
● The user could install the binary of DSC2014Tutorial 
directly via 
install.packages(DSC2014Tutorial, repos = 
"http://TaiwanRUserGroup.github.io/R")
Cooperation 
● I cannot build all slides of tutorial 
– There are 7 slides built from different groups of speakers 
● Each slides should be managed by its author 
– Each slides is a standalone git repository 
– No branching here because not all speakers are familiear with 
git 
● Use gitsubmodule to embed these slides into R Package 
● We need modern work flow to control the quality
Workflow 1 
1.Each speakers creates the slides and initialize the git 
repository 
2.Speakers commit their changes to git repository 
3.Open the pull request 
4.Slide review and test on different platform 
5.Merge changes to DSC2014Tutorial
Commits
Pull Requests
Review
Merge
Slide Review 
● Each speakers review the slides of each others 
● The comment are posted to Issue of the github pages 
● The speaker should resolve the posted issue
Issues
Challenge 
● After the first rehearsal on Taiwan R User Group, 
we notice a serious encoding issue 
– Default chinese encoding is different
Challenge 
● We could resolve the specific issue 
● The slides are evolving, some bugs might occur 
● We need to test the slides, but there are 7 slides and 
we want to test them on Windows, ubuntu and mac*
Why CI 
● CI automates the following things 
– Testing 
– Integration 
– Deployment 
● CI makes me a better life 
● CI also introduces some problems. Let's discuss it 
later.
Test R Package 
● R CMD check --no-codoc --no-manual --no-vignettes 
–no-build-vignettes
Deploy R Package 
● git push 
● Commit to R Repository 
tools::write_PACKAGES( type = c("source", 
"mac.binary", "win.binary") )
R and CI 
travis-ci.org
Existed work for R and Travis-ci 
● https://github.com/craigcitro/r-travis/wiki
travis.yml 
language: c 
script: ./travis-tool.sh run_tests 
after_failure: 
- ./travis-tool.sh dump_logs 
before_install: 
- curl -OL http://raw.github.com/craigcitro/r-travis/ 
master/scripts/travis-tool.sh 
- chmod 755 ./travis-tool.sh 
- ./travis-tool.sh bootstrap 
- ./travis-tool.sh r_binary_install XML Rcpp knitr 
brew RUnit inline highlight formatR highr markdown rgl 
install: 
- ./travis-tool.sh install_deps 
- ./travis-tool.sh install_github hadley/testthat 
notifications: 
email: 
on_success: change 
on_failure: change 
env:
R and CI 
jenkins
Setup Jenkins 
● Github Plugin 
– http://sanketdangi.com/post/62740311628/integrate-jenkins- 
github-trigger-build-process 
● Github Pull Request Builder 
– http://www.kabisa.nl/building-github-pull-requests-with-jenkins/ 
● Firewall (open to 192.30.252.0/22)
Auto Testing
Result
Discussion 
● No Error v.s. No Warnings 
● Existed Problems: 
– Memory issue 
– Unknown Bugs 
– Unclear Message
Summary 
● Tutorial and R Package 
● Git and R Package 
● Github and R Package 
● CI and R Package
Q&A
Thanks for your listening

R, Git, Github, and CI

  • 1.
    R, Git, Github,and CI TTaaiiwwaann RR UUsseerr GGrroouupp WWuusshh WWuu 22001144--0099--2200
  • 2.
    DSC 2014 ●2014 is the first year of DSC(Data Science Conference) in Taiwan. ● We (Taiwan R User Group) organizes the Tutorial Program of R in DSC. ● There were more than 100 students joined us during DSC 2014. ● The averaged rating is more than 4.2 (1 ~ 5).
  • 3.
    Goal of Tutorial ● Systematically introduce the analysis step with R – Basic – Data Manipulation(Extract, Transform and Loading) – Analysis – Visualization ● Based on the latest tools of R ● Reproducibility of examples ● Integration of materials ● *Well designed exercises
  • 4.
    About Me ●PhD Candidate in NTU EE ● Current research field: – Online Advertisement – Large Scale Predictive Modeling ● Organizer of Taiwan R User Group ● Organizer of Tutorial Program in DSC 2014
  • 5.
    Outline ● Sharethe experience of organizing tutorial program with 16 people with: – Git, my favorite tool of version control – Github, a platform of cooperation – Jenkins, a system of automation ● I will show how to cooperate these tools with R package
  • 6.
    Why R Package ● There are many dependency for examples and exercises ● R package is the recommended way to share your code ● Wrap all materials in one R Package: DSC2014Tutorial so the students only need to download once. – All slides are included. – Customized R API – All data – *Installation of depended packages – Solving issue of portability(Windows, Mac, and Ubuntu) ● The package is easily managed by git and released on github
  • 7.
    The structure ofR package Dependencies ● DESCRIPTION Package: DSC2014Tutorial Type: Package Title: Materials of Tutorial Program on DSC 2014 Version: 1.2 Date: 2014-08-03 Author: Taiwan R User Group Maintainer: Wush Wu <wush978@gmail.com> Description: This package contains the required materials of R Tutorial DSC2014 License: GPL (>= 3) Depends: R (>= 3.1.0) Imports: tools, ...
  • 8.
    The structure ofR package Data ● data data(salary, package = 'DSC2014Tutorial')
  • 9.
    The structure ofR package cross-platform ● configure.ac / configure
  • 10.
    The structure ofR package slides and external source system.file('Basic', package = 'DSC2014Tutorial')
  • 11.
    Git, Version Control ● Some speakers are new to git ● We used the following feature: – Self version control: add, commit – Repository: remote, push, pull, and merge – Cooperation: submodul ● Git plays the fundamental role in our workflow
  • 12.
    Why Git? ●Speed is king ● Local commits rock ● Github ●My favorite
  • 13.
    Github ● Mostpopular platform for managing git repository ● Provide many convenient features – Account of Organization – Designed for cooperation – Simple integration with many popular CI tools – Static website (Sufficient for R Repository)
  • 14.
    Release R Packageon Github ● R is released as: – a git repository – a R repository
  • 15.
    Github and RRepository ● How to establish a R repository on github: 1.Create a new git repository named 『R』 2.Add the content of R repository into git repository in branch gghh--ppaaggeess 3. Push and wait 4. The R Repository is located at http://<account>.github.io/R ● The user could install the binary of DSC2014Tutorial directly via install.packages(DSC2014Tutorial, repos = "http://TaiwanRUserGroup.github.io/R")
  • 16.
    Cooperation ● Icannot build all slides of tutorial – There are 7 slides built from different groups of speakers ● Each slides should be managed by its author – Each slides is a standalone git repository – No branching here because not all speakers are familiear with git ● Use gitsubmodule to embed these slides into R Package ● We need modern work flow to control the quality
  • 17.
    Workflow 1 1.Eachspeakers creates the slides and initialize the git repository 2.Speakers commit their changes to git repository 3.Open the pull request 4.Slide review and test on different platform 5.Merge changes to DSC2014Tutorial
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Slide Review ●Each speakers review the slides of each others ● The comment are posted to Issue of the github pages ● The speaker should resolve the posted issue
  • 23.
  • 24.
    Challenge ● Afterthe first rehearsal on Taiwan R User Group, we notice a serious encoding issue – Default chinese encoding is different
  • 25.
    Challenge ● Wecould resolve the specific issue ● The slides are evolving, some bugs might occur ● We need to test the slides, but there are 7 slides and we want to test them on Windows, ubuntu and mac*
  • 26.
    Why CI ●CI automates the following things – Testing – Integration – Deployment ● CI makes me a better life ● CI also introduces some problems. Let's discuss it later.
  • 27.
    Test R Package ● R CMD check --no-codoc --no-manual --no-vignettes –no-build-vignettes
  • 28.
    Deploy R Package ● git push ● Commit to R Repository tools::write_PACKAGES( type = c("source", "mac.binary", "win.binary") )
  • 29.
    R and CI travis-ci.org
  • 30.
    Existed work forR and Travis-ci ● https://github.com/craigcitro/r-travis/wiki
  • 31.
    travis.yml language: c script: ./travis-tool.sh run_tests after_failure: - ./travis-tool.sh dump_logs before_install: - curl -OL http://raw.github.com/craigcitro/r-travis/ master/scripts/travis-tool.sh - chmod 755 ./travis-tool.sh - ./travis-tool.sh bootstrap - ./travis-tool.sh r_binary_install XML Rcpp knitr brew RUnit inline highlight formatR highr markdown rgl install: - ./travis-tool.sh install_deps - ./travis-tool.sh install_github hadley/testthat notifications: email: on_success: change on_failure: change env:
  • 32.
    R and CI jenkins
  • 33.
    Setup Jenkins ●Github Plugin – http://sanketdangi.com/post/62740311628/integrate-jenkins- github-trigger-build-process ● Github Pull Request Builder – http://www.kabisa.nl/building-github-pull-requests-with-jenkins/ ● Firewall (open to 192.30.252.0/22)
  • 34.
  • 35.
  • 36.
    Discussion ● NoError v.s. No Warnings ● Existed Problems: – Memory issue – Unknown Bugs – Unclear Message
  • 37.
    Summary ● Tutorialand R Package ● Git and R Package ● Github and R Package ● CI and R Package
  • 38.
  • 39.
    Thanks for yourlistening