Turbocharge your R

                    Rob Zinkov


                   July 12th, 2011




Rob Zinkov ()       Turbocharge your R   July 12th, 2011   1 / 23
Outline



1   Introduction


2   .C


3   .Call


4   Rcpp




         Rob Zinkov ()   Turbocharge your R   July 12th, 2011   2 / 23
Introduction


What is the point of this talk?




Show you how to speed up your R code




     Rob Zinkov ()            Turbocharge your R   July 12th, 2011   3 / 23
Introduction


Caveats




  • Please try to optimize your R code first
  • Some of these mechanisms will make coding harder




     Rob Zinkov ()              Turbocharge your R     July 12th, 2011   4 / 23
.C




• This is the basic mechanism
• Explicitly copies the data into C
• Only accepts integer vectors




    Rob Zinkov ()            Turbocharge your R   July 12th, 2011   5 / 23
.C


Step 1. Put function in file (foo.c)



void foo(int *nin, double *x)
{
int n = nin[0];

int i;

for (i=0; i<n; i++)
x[i] = x[i] * x[i];
}




     Rob Zinkov ()       Turbocharge your R   July 12th, 2011   6 / 23
.C




• Note this is a void function
• Note arguments are passed in as pointers
• Try to limit one function per file




    Rob Zinkov ()            Turbocharge your R   July 12th, 2011   7 / 23
.C


Step 2. Compile file with R




$ R CMD SHLIB foo.c




     Rob Zinkov ()     Turbocharge your R   July 12th, 2011   8 / 23
.C


Step 3. Load into R




> dyn.load("foo.so")




     Rob Zinkov ()     Turbocharge your R   July 12th, 2011   9 / 23
.C


Step 4. Call your code




 .C("foo", n=as.integer(5), x=as.double(rnorm(5)))




     Rob Zinkov ()       Turbocharge your R    July 12th, 2011   10 / 23
.C




• Arguments to .C are name of function followed by arguments
• Arguments must be the right type
• Touching C code runs risks of segfaults




   Rob Zinkov ()            Turbocharge your R       July 12th, 2011   11 / 23
.Call


Why?




 • Less copying of data structures (lower memory)
 • Access more of R data structures
 • Access more kinds of R data
 • Do more in C




    Rob Zinkov ()            Turbocharge your R     July 12th, 2011   12 / 23
.Call


.Call code


#include <R.h>
#include <Rinternals.h>
#include <Rmath.h>

SEXP vecSum(SEXP Rvec){
  int i, n;
  double *vec, value = 0;
  vec = REAL(Rvec);
  n = length(Rvec);
  for (i = 0; i < n; i++) value += vec[i];
  printf("The value is: %4.6f n", value);
  return R_NilValue;
}


     Rob Zinkov ()        Turbocharge your R   July 12th, 2011   13 / 23
.Call




R CMD SHLIB vecSum.c
dyn.load("vecSum.so")
.Call("vecSum", rnorm(10))




     Rob Zinkov ()       Turbocharge your R   July 12th, 2011   14 / 23
.Call




SEXP ab(SEXP Ra, SEXP Rb){
   int i, a, b;
   SEXP Rval;
   Ra = coerceVector(Ra, INTSXP);
   Rb = coerceVector(Rb, INTSXP);
   a = INTEGER(Ra)[0];
   b = INTEGER(Rb)[0];
   PROTECT(Rval = allocVector(INTSXP, b - a + 1));
   for (i = a; i <= b; i++)
       INTEGER(Rval)[i - a] = i;
   UNPROTECT(1);
   return Rval;
}



     Rob Zinkov ()       Turbocharge your R    July 12th, 2011   15 / 23
.Call




Since memory is shared explicit care must be taken not to collide with R




      Rob Zinkov ()            Turbocharge your R         July 12th, 2011   16 / 23
Rcpp


Why?




 • Use C++ instead of C
 • Ability to use objects to represent R more naturally
 • Easier to load code




     Rob Zinkov ()            Turbocharge your R          July 12th, 2011   17 / 23
Rcpp




src <- ’
    IntegerVector tmp(clone(x));
    double rate = as< double >(y);
    int tmpsize = tmp.size();
    RNGScope scope;
    for (int ii =0; ii < tmpsize; ii++) {
        tmp(ii) = Rf_rbinom(tmp(ii), rate);
    };
    return tmp;
’
require(inline)
## compile the function, inspect the process with verbose=T
testfun2 = cxxfunction(signature(x=’integer’, y=’numeric’),
                       src, plugin=’Rcpp’, verbose=T)



     Rob Zinkov ()       Turbocharge your R    July 12th, 2011   18 / 23
Rcpp




require(inline)
testfun = cxxfunction(
    signature(x="numeric",
              i="integer"),
              body = ’
                        NumericVector xx(x);
                        int ii = as<int>(i);
                        xx = xx * ii;
                        return( xx );
                     ’, plugin="Rcpp")
testfun(1:5, 3)




     Rob Zinkov ()       Turbocharge your R    July 12th, 2011   19 / 23
Rcpp


Conclusions




It is fairly easy to make R faster




      Rob Zinkov ()              Turbocharge your R   July 12th, 2011   20 / 23
Rcpp


Conclusions




Now go make your R code faster




     Rob Zinkov ()          Turbocharge your R   July 12th, 2011   21 / 23
Rcpp


References



  • http://www.stat.umn.edu/ charlie/rc/
  • http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-
    complexity-versus.html
  • http://www.sfu.ca/ sblay/R-C-interface.ppt
  • http://www.biostat.jhsph.edu/ bcaffo/statcomp/files/dotCall.pdf
  • http://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-
    quickref.pdf
  • http://www.jstatsoft.org/v40/i08/paper




     Rob Zinkov ()           Turbocharge your R        July 12th, 2011   22 / 23
Rcpp




                Questions?




Rob Zinkov ()     Turbocharge your R   July 12th, 2011   23 / 23

Los Angeles R users group - July 12 2011 - Part 2

  • 1.
    Turbocharge your R Rob Zinkov July 12th, 2011 Rob Zinkov () Turbocharge your R July 12th, 2011 1 / 23
  • 2.
    Outline 1 Introduction 2 .C 3 .Call 4 Rcpp Rob Zinkov () Turbocharge your R July 12th, 2011 2 / 23
  • 3.
    Introduction What is thepoint of this talk? Show you how to speed up your R code Rob Zinkov () Turbocharge your R July 12th, 2011 3 / 23
  • 4.
    Introduction Caveats •Please try to optimize your R code first • Some of these mechanisms will make coding harder Rob Zinkov () Turbocharge your R July 12th, 2011 4 / 23
  • 5.
    .C • This isthe basic mechanism • Explicitly copies the data into C • Only accepts integer vectors Rob Zinkov () Turbocharge your R July 12th, 2011 5 / 23
  • 6.
    .C Step 1. Putfunction in file (foo.c) void foo(int *nin, double *x) { int n = nin[0]; int i; for (i=0; i<n; i++) x[i] = x[i] * x[i]; } Rob Zinkov () Turbocharge your R July 12th, 2011 6 / 23
  • 7.
    .C • Note thisis a void function • Note arguments are passed in as pointers • Try to limit one function per file Rob Zinkov () Turbocharge your R July 12th, 2011 7 / 23
  • 8.
    .C Step 2. Compilefile with R $ R CMD SHLIB foo.c Rob Zinkov () Turbocharge your R July 12th, 2011 8 / 23
  • 9.
    .C Step 3. Loadinto R > dyn.load("foo.so") Rob Zinkov () Turbocharge your R July 12th, 2011 9 / 23
  • 10.
    .C Step 4. Callyour code .C("foo", n=as.integer(5), x=as.double(rnorm(5))) Rob Zinkov () Turbocharge your R July 12th, 2011 10 / 23
  • 11.
    .C • Arguments to.C are name of function followed by arguments • Arguments must be the right type • Touching C code runs risks of segfaults Rob Zinkov () Turbocharge your R July 12th, 2011 11 / 23
  • 12.
    .Call Why? • Lesscopying of data structures (lower memory) • Access more of R data structures • Access more kinds of R data • Do more in C Rob Zinkov () Turbocharge your R July 12th, 2011 12 / 23
  • 13.
    .Call .Call code #include <R.h> #include<Rinternals.h> #include <Rmath.h> SEXP vecSum(SEXP Rvec){ int i, n; double *vec, value = 0; vec = REAL(Rvec); n = length(Rvec); for (i = 0; i < n; i++) value += vec[i]; printf("The value is: %4.6f n", value); return R_NilValue; } Rob Zinkov () Turbocharge your R July 12th, 2011 13 / 23
  • 14.
    .Call R CMD SHLIBvecSum.c dyn.load("vecSum.so") .Call("vecSum", rnorm(10)) Rob Zinkov () Turbocharge your R July 12th, 2011 14 / 23
  • 15.
    .Call SEXP ab(SEXP Ra,SEXP Rb){ int i, a, b; SEXP Rval; Ra = coerceVector(Ra, INTSXP); Rb = coerceVector(Rb, INTSXP); a = INTEGER(Ra)[0]; b = INTEGER(Rb)[0]; PROTECT(Rval = allocVector(INTSXP, b - a + 1)); for (i = a; i <= b; i++) INTEGER(Rval)[i - a] = i; UNPROTECT(1); return Rval; } Rob Zinkov () Turbocharge your R July 12th, 2011 15 / 23
  • 16.
    .Call Since memory isshared explicit care must be taken not to collide with R Rob Zinkov () Turbocharge your R July 12th, 2011 16 / 23
  • 17.
    Rcpp Why? • UseC++ instead of C • Ability to use objects to represent R more naturally • Easier to load code Rob Zinkov () Turbocharge your R July 12th, 2011 17 / 23
  • 18.
    Rcpp src <- ’ IntegerVector tmp(clone(x)); double rate = as< double >(y); int tmpsize = tmp.size(); RNGScope scope; for (int ii =0; ii < tmpsize; ii++) { tmp(ii) = Rf_rbinom(tmp(ii), rate); }; return tmp; ’ require(inline) ## compile the function, inspect the process with verbose=T testfun2 = cxxfunction(signature(x=’integer’, y=’numeric’), src, plugin=’Rcpp’, verbose=T) Rob Zinkov () Turbocharge your R July 12th, 2011 18 / 23
  • 19.
    Rcpp require(inline) testfun = cxxfunction( signature(x="numeric", i="integer"), body = ’ NumericVector xx(x); int ii = as<int>(i); xx = xx * ii; return( xx ); ’, plugin="Rcpp") testfun(1:5, 3) Rob Zinkov () Turbocharge your R July 12th, 2011 19 / 23
  • 20.
    Rcpp Conclusions It is fairlyeasy to make R faster Rob Zinkov () Turbocharge your R July 12th, 2011 20 / 23
  • 21.
    Rcpp Conclusions Now go makeyour R code faster Rob Zinkov () Turbocharge your R July 12th, 2011 21 / 23
  • 22.
    Rcpp References •http://www.stat.umn.edu/ charlie/rc/ • http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r- complexity-versus.html • http://www.sfu.ca/ sblay/R-C-interface.ppt • http://www.biostat.jhsph.edu/ bcaffo/statcomp/files/dotCall.pdf • http://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp- quickref.pdf • http://www.jstatsoft.org/v40/i08/paper Rob Zinkov () Turbocharge your R July 12th, 2011 22 / 23
  • 23.
    Rcpp Questions? Rob Zinkov () Turbocharge your R July 12th, 2011 23 / 23