R and C++

R and C++
Romain François
!

romain@r-enthusiasts.com

@romainfrancois

Topics
•

Rcpp

•

dplyr

•

Rcpp98, Rcpp11

0.10.6
currently
!

0.10.7 out soon, and perhaps it will be called 0.11.0, or
perhaps 1.0.0

172
cran packages directly depend* on it

97 163
lines of code (*.cpp + *.h)

int add( int a, int b){
return a + b ;
}

#include <Rcpp.h>
!

// [[Rcpp::export]]
return a + b ;
}

sourceCpp
#include <Rcpp.h>
!

// [[Rcpp::export]]
return a + b ;
}
> sourceCpp( "add.cpp" )
> add( 1, 2 )
[1] 3

R data
•

vectors: NumericVector, IntegerVector, …

•

lists : List

•

functions: Function

•

environments: Environment

Key design decision
Rcpp objects are proxy objects to
the underlying R data structure

No additional memory

Example: Vector
// [[Rcpp::export]]
double sum( NumericVector x){
int n = x.size() ;
!

double res = 0.0 ;
for( int i=0; i<n; i++){
res += x[i] ;
}
!

return res ;
}

Example: List
List res = List::create(
_["a"] = 1,
_["b"] = "foo"
) ;
res.attr( "class" ) = "myclass" ;
!

int a = res["a"] ;
res["b"] = 42 ;

Example: Function
Function rnorm( "rnorm" ) ;
NumericVector x = rnorm(
10,
_["mean"] = 30,
_["sd"] = 100
) ;

Benchmark
n <- length(x)
m <- 0.0
for( i in 1:n ){
m <- m + x[i]^2 / n
}

Benchmark
#include <Rcpp.h>
using namespace Rcpp ;
!

double square(x){ return x*x ; }
!

// [[Rcpp::export]]
double fun( NumericVector x){
int n = x.size() ;
double res = 0.0 ;
for( int i=0; i<n; i++){
res += square(x[i]) / n ;
}
return res ;
}

Benchmark
Execution times (micro seconds)
10 000

100 000

1 000 000

Dumb R

1008

10 214

104 000

Vectorized R

24

125

1 021

C++

13

80

709

The usual bank account example
class Account {
private:
double balance ;

!
public:
Account( ) : balance(0){}

!
double get_balance(){
return balance ;
}

!
void withdraw(double x){
balance -= x ;
}

!
void deposit(double x ){
balance += x ;
}
} ;

RCPP_MODULE(BankAccount){
class_<Account>( "Account" )
.constructor()

!
.property( "balance", Account::get_balance )

!
.method( "deposit", Account::deposit)
.method( "withdraw", Account::withdraw)
;
}

account <- new( Account )
account$deposit( 1000 )
account$balance
account$withdraw( 200 )
account$balance
account$balance <- 200

Packages
Rcpp.package.skeleton
compileAttributes
!
!

devtools::load_all

Rcpp.package.skeleton
Extension of package.skeleton
!

Adds Rcpp speciﬁc artefacts and code examples

> Rcpp.package.skeleton( "cph" )

Edit your .cpp files
// [[Rcpp::export]]
int add( int a,int b){
return a + b ;
}

Then devtools::load_all
This updates C++ and R generated code

dplyr
•

Package by Hadley Whickham

•

Plyr specialised for data frames: faster & with
remote data stores

•

Great design and syntax

•

Great performance thanks to C++

arrange
ex: Arrange by year within each player

arrange(Batting,
playerID, yearID)
Unit: milliseconds
expr
min
lq
df 186.64016 188.48495
dt 349.25496 352.12806
cpp 12.20485 13.85538
base 181.68259 182.58014
dt_raw 166.94213 170.15704

median
190.8989
357.4358
14.0081
184.6904
170.6418

uq
192.42140
403.45465
16.72979
186.33794
220.89911

max neval
195.36592
10
405.30055
10
23.95173
10
189.70377
10
223.42155
10

ﬁlter
Find the year for which each player played the most games

filter(Batting, G == max(G))
Unit: milliseconds
expr
min
lq
median
uq
max neval
df 371.96066 375.98652 380.92300 389.78870 430.2898
10
dt 47.37897 49.39681 51.23722 52.79181 95.8757
10
cpp 34.63382 35.27462 36.48151 38.30672 106.2422
10
base 141.81983 144.87670 147.36940 148.67299 173.8763
10

summarise
Compute the average number of at bats for each player

summarise(x, ab = mean(AB))
Unit: microseconds
expr
min
lq
median
uq
max neval
df 470726.569 475168.481 495500.076 498223.152 502601.494
10
dt 23002.422 23923.691 25888.191 28517.318 28683.864
10
cpp
756.265
820.921
838.529
864.624
950.079
10
base 253189.624 259167.496 263124.650 273097.845 326663.243
10
dt_raw 22462.560 23469.528 24438.422 25718.549 28385.158
10

Vector Visitor
Traversing an R vector of any type with the same interface
class VectorVisitor {
public:
virtual ~VectorVisitor(){}
/** hash the element of the visited vector at index i */
virtual size_t hash(int i) const = 0 ;
/** are the elements at indices i and j equal */
virtual bool equal(int i, int j) const = 0 ;

!
/** creates a new vector, of the same type as the visited vector, by
* copying elements at the given indices
*/
virtual SEXP subset( const Rcpp::IntegerVector& index ) const = 0 ;

!
}

Vector Visitor
inline VectorVisitor* visitor( SEXP vec ){
switch( TYPEOF(vec) ){
case INTSXP:
if( Rf_inherits(vec, "factor" ))
return new FactorVisitor( vec ) ;
return new VectorVisitorImpl<INTSXP>( vec ) ;
case REALSXP:
if( Rf_inherits( vec, "Date" ) )
return new DateVisitor( vec ) ;
if( Rf_inherits( vec, "POSIXct" ) )
return new POSIXctVisitor( vec ) ;
return new VectorVisitorImpl<REALSXP>( vec ) ;
case LGLSXP: return new VectorVisitorImpl<LGLSXP>( vec ) ;
case STRSXP: return new VectorVisitorImpl<STRSXP>( vec ) ;
default: break ;
}
// should not happen
return 0 ;
}

Chunked evaluation
ir <- group_by( iris, Species)
summarise(ir,
Sepal.Length = mean(Sepal.Length)
)
•

R expression to evaluate: mean(Sepal.Length)

•

Sepal.Length

•

dplyr knows mean.

•

fast and memory efﬁcient algorithm

∊

iris

Hybrid evaluation
myfun <- function(x) x+x
ir <- group_by( iris, Species)
summarise(ir,
xxx = mean(Sepal.Length) + min(Sepal.Width) - myfun(Sepal.Length)
)

#1: fast evaluation of mean(Sepal.Length).
5.006 + min(Sepal.Width) - myfun(Sepal.Length)

#2: fast evaluation of min(Sepal.Width).
5.006 + 3.428 - myfun(Sepal.Length)

#3: fast evaluation of 5.006 + 3.428.
8.434 - myfun(Sepal.Length)

#4: R evaluation of 8.434 - myfun(Sepal.Length).

Hybrid Evaluation
!

•

mean, min, max, sum, sd, var, n, +, -, /, *, <, >,
<=, >=, &&, ||

•

packages can register their own hybrid
evaluation handler.

•

See hybrid-evaluation vignette

Rcpp11
•

Using C++11 features

•

Smaller

•

More memory efﬁcient

•

Clean

C++11 :
Lambda: function deﬁned where used. Similar to apply
functions in R.

// [[Rcpp::export]]
NumericVector foo( NumericVector v){
NumericVector res = sapply( v,
[](double x){ return x*x; }
) ;
return res ;
}

C++11 : for each loop
C++98, C++03
std::vector<double> v ;
for( int i=0; i<v.size(); v++){
double d = v[i] ;
// do something with d
}

C++11
for( double d: v){
// do stuff with d
}

C++11 : init list
C++98, C++03
NumericVector x = NumericVector::create( 1, 2 ) ;

C++11
NumericVector x = {1, 2} ;

Other changes
•

Move semantics : used under the hood in
Rcpp11. Using less memory.

•

Less code bloat. Variadic templates

Rcpp11 article
•

I’m writing an article about C++11

•

Explain the merits of C++11

•

What’s next: C++14, C++17

•

Goal is to make C++11 welcome on CRAN

•

https://github.com/romainfrancois/cpp11_article

Questions
Romain François
!
romain@r-enthusiasts.com

@romainfrancois

R and C++

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to R and C++

Similar to R and C++ (20)

Recently uploaded

Recently uploaded (20)

R and C++