SlideShare a Scribd company logo
R and C++
Romain François
!
romain@r-enthusiasts.com

@romainfrancois
Topics
•

Rcpp

•

dplyr

•

Rcpp98, Rcpp11
Rcpp
54
releases since 2008
0.10.6
currently
!

0.10.7 out soon, and perhaps it will be called 0.11.0
172
cran packages depend on it
96 896
lines of code (*.cpp + *.h)
int add( int a, int b){
return a + b ;
}
#include <Rcpp.h>
!

// [[Rcpp::export]]
int add( int a, int b){
return a + b ;
}
A bridge between R and C++
sourceCpp
#include <Rcpp.h>
!

// [[Rcpp::export]]
int add( int a, int b){
return a + b ;
}
> sourceCpp( "add.cpp" )
> add( 1, 2 )
[1] 3
R data structures
•

vectors: NumericVector, IntegerVector, …

•

lists : List

•

functions: Function

•

environments: Environment
Key design decision
Rcpp objects are proxy objects to
the underlying R data structure
Example: NumericVector
// [[Rcpp::export]]
double sum( NumericVector x){
int n = x.size() ;
!

double res = 0.0 ;
for( int i=0; i<n; i++){
res += x[i] ;
}
!

return res ;
}
Example: List
List res = List::create(
_["a"] = 1,
_["b"] = "foo"
) ;
res.attr( "class" ) = "myclass" ;
!

int a = res["a"] ;
res["b"] = 42 ;
Example: Function
Function rnorm( "rnorm" ) ;
NumericVector x = rnorm(
10,
_["mean"] = 30,
_["sd"] = 100
) ;
Benchmark
n <- length(x)
m <- 0.0
for( i in 1:n ){
m <- m + x[i]^2 / n
}
Benchmark
m <- mean( x^2 )
Benchmark

#include <Rcpp.h>
using namespace Rcpp ;
!

double square(x){ return x*x ; }
!

// [[Rcpp::export]]
double fun( NumericVector x){
int n = x.size() ;
double res = 0.0 ;
for( int i=0; i<n; i++){
res += square(x[i]) / n ;
}
return res ;
}
Benchmark
Execution times (micro seconds)
10 000

100 000

1 000 000

Dumb R

1008

10 214

104 000

Vectorized R

24

125

1 021

C++

13

80

709
Benchmark
m <- mean( x^2 )
C++ data structures
Modules
The usual bank account example
class Account {
private:
double balance ;

!
public:
Account( ) : balance(0){}

!
double get_balance(){
return balance ;
}

!
void withdraw(double x){
balance -= x ;
}

!
void deposit(double x ){
balance += x ;
}
} ;

RCPP_MODULE(BankAccount){
class_<Account>( "Account" )
.constructor()

!
.property( "balance", Account::get_balance )

!
.method( "deposit", Account::deposit)
.method( "withdraw", Account::withdraw)
;
}

account <- new( Account )
account$deposit( 1000 )
account$balance
account$withdraw( 200 )
account$balance
account$balance <- 200
Packages
Rcpp.package.skeleton
compileAttributes
!
!

devtools::load_all
dplyr
dplyr
•

Package by Hadley Whickham

•

Plyr specialised for data frames: faster & with
remote datastores

•

Great performance thanks to C++
arrange
ex: Arrange by year within each player

arrange(Batting,
playerID, yearID)
Unit: milliseconds
expr
min
lq
df 186.64016 188.48495
dt 349.25496 352.12806
cpp 12.20485 13.85538
base 181.68259 182.58014
dt_raw 166.94213 170.15704

median
190.8989
357.4358
14.0081
184.6904
170.6418

uq
192.42140
403.45465
16.72979
186.33794
220.89911

max neval
195.36592
10
405.30055
10
23.95173
10
189.70377
10
223.42155
10
filter

Find the year for which each player played the most games

filter(Batting, G == max(G))

Unit: milliseconds
expr
min
lq
median
uq
max neval
df 371.96066 375.98652 380.92300 389.78870 430.2898
10
dt 47.37897 49.39681 51.23722 52.79181 95.8757
10
cpp 34.63382 35.27462 36.48151 38.30672 106.2422
10
base 141.81983 144.87670 147.36940 148.67299 173.8763
10
summarise

Compute the average number of at bats for each player

summarise(x, ab = mean(AB))

Unit: microseconds
expr
min
lq
median
uq
max neval
df 470726.569 475168.481 495500.076 498223.152 502601.494
10
dt 23002.422 23923.691 25888.191 28517.318 28683.864
10
cpp
756.265
820.921
838.529
864.624
950.079
10
base 253189.624 259167.496 263124.650 273097.845 326663.243
10
dt_raw 22462.560 23469.528 24438.422 25718.549 28385.158
10
Vector Visitor
Traversing an R vector of any type with the same interface
class VectorVisitor {
public:
virtual ~VectorVisitor(){}
/** hash the element of the visited vector at index i */
virtual size_t hash(int i) const = 0 ;
/** are the elements at indices i and j equal */
virtual bool equal(int i, int j) const = 0 ;

!
/** creates a new vector, of the same type as the visited vector, by
* copying elements at the given indices
*/
virtual SEXP subset( const Rcpp::IntegerVector& index ) const = 0 ;

!
}
Vector Visitor
inline VectorVisitor* visitor( SEXP vec ){
switch( TYPEOF(vec) ){
case INTSXP:
if( Rf_inherits(vec, "factor" ))
return new FactorVisitor( vec ) ;
return new VectorVisitorImpl<INTSXP>( vec ) ;
case REALSXP:
if( Rf_inherits( vec, "Date" ) )
return new DateVisitor( vec ) ;
if( Rf_inherits( vec, "POSIXct" ) )
return new POSIXctVisitor( vec ) ;
return new VectorVisitorImpl<REALSXP>( vec ) ;
case LGLSXP: return new VectorVisitorImpl<LGLSXP>( vec ) ;
case STRSXP: return new VectorVisitorImpl<STRSXP>( vec ) ;
default: break ;
}
// should not happen
return 0 ;
}
Chunked evaluation
ir <- group_by( iris, Species)
summarise(ir,
Sepal.Length = mean(Sepal.Length)
)
•

R expression to evaluate: mean(Sepal.Length)

•

Sepal.Length

•

dplyr knows mean.

•

fast and memory efficient algorithm

∊

iris
Hybrid evaluation
myfun <- function(x) x+x
ir <- group_by( iris, Species)
summarise(ir,
xxx = mean(Sepal.Length) + min(Sepal.Width) - myfun(Sepal.Length)
)

#1: fast evaluation of mean(Sepal.Length).
5.006 + min(Sepal.Width) - myfun(Sepal.Length)

#2: fast evaluation of min(Sepal.Width).
5.006 + 3.428 - myfun(Sepal.Length)

#3: fast evaluation of 5.006 + 3.428.
8.434 - myfun(Sepal.Length)

#4: R evaluation of 8.434 - myfun(Sepal.Length).
Hybrid Evaluation
!

•

mean, min, max, sum, sd, var, n, +, -, /, *, <, >,
<=, >=, &&, ||

•

packages can register their own hybrid
evaluation handler.

•

See hybrid-evaluation vignette
Rcpp11
Rcpp11
•

Using C++11 features

•

Smaller

•

More memory efficient

•

Clean
C++11 : lambda
Lambda: function defined where used. Similar to apply
functions in R.

// [[Rcpp::export]]
NumericVector foo( NumericVector v){
NumericVector res = sapply( v,
[](double x){ return x*x; }
) ;
return res ;
}
C++11 : for each loop
C++98, C++03
std::vector<double> v ;
for( int i=0; i<v.size(); v++){
double d = v[i] ;
// do something with d
}

C++11
for( double d: v){
// do stuff with d
}
C++11 : init list
C++98, C++03
NumericVector x = NumericVector::create( 1, 2 ) ;

C++11
NumericVector x = {1, 2} ;
Other changes

•

Move semantics : used under the hood in
Rcpp11. Using less memory.

•

Less code bloat. Variadoic templates
Rcpp11 article
•

I’m writing an article about C++11

•

Explain the merits of C++11

•

What’s next: C++14, C++17

•

Goal is to make C++11 welcome on CRAN

•

https://github.com/romainfrancois/cpp11_article
Questions
Romain François
!
romain@r-enthusiasts.com

@romainfrancois

More Related Content

What's hot

scalable machine learning
scalable machine learningscalable machine learning
scalable machine learning
Samir Bessalah
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
InfluxData
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Databricks
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul Dix
InfluxData
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Databricks
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
INFLUXQL & TICKSCRIPT
INFLUXQL & TICKSCRIPTINFLUXQL & TICKSCRIPT
INFLUXQL & TICKSCRIPT
InfluxData
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
InfluxData
 
Ordered Record Collection
Ordered Record CollectionOrdered Record Collection
Ordered Record Collection
Hadoop User Group
 
Weather of the Century: Design and Performance
Weather of the Century: Design and PerformanceWeather of the Century: Design and Performance
Weather of the Century: Design and Performance
MongoDB
 
Flux and InfluxDB 2.0
Flux and InfluxDB 2.0Flux and InfluxDB 2.0
Flux and InfluxDB 2.0
InfluxData
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
InfluxData
 
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob LisiUsing Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi
InfluxData
 
k-means Clustering in Python
k-means Clustering in Pythonk-means Clustering in Python
k-means Clustering in Python
Dr. Volkan OBAN
 
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Data Con LA
 
HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?
bzamecnik
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Databricks
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
Toni Cebrián
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
Scott Gardner
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
Daniel Lemire
 

What's hot (20)

scalable machine learning
scalable machine learningscalable machine learning
scalable machine learning
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul Dix
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
INFLUXQL & TICKSCRIPT
INFLUXQL & TICKSCRIPTINFLUXQL & TICKSCRIPT
INFLUXQL & TICKSCRIPT
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
Ordered Record Collection
Ordered Record CollectionOrdered Record Collection
Ordered Record Collection
 
Weather of the Century: Design and Performance
Weather of the Century: Design and PerformanceWeather of the Century: Design and Performance
Weather of the Century: Design and Performance
 
Flux and InfluxDB 2.0
Flux and InfluxDB 2.0Flux and InfluxDB 2.0
Flux and InfluxDB 2.0
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob LisiUsing Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi
 
k-means Clustering in Python
k-means Clustering in Pythonk-means Clustering in Python
k-means Clustering in Python
 
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
 
HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 

Similar to R and cpp

Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
Qiangning Hong
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itEvgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
flyinweb
 
R/C++ talk at earl 2014
R/C++ talk at earl 2014R/C++ talk at earl 2014
R/C++ talk at earl 2014
Romain Francois
 
What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++
Microsoft
 
Extend R with Rcpp!!!
Extend R with Rcpp!!!Extend R with Rcpp!!!
Extend R with Rcpp!!!mickey24
 
2 BytesC++ course_2014_c3_ function basics&parameters and overloading
2 BytesC++ course_2014_c3_ function basics&parameters and overloading2 BytesC++ course_2014_c3_ function basics&parameters and overloading
2 BytesC++ course_2014_c3_ function basics&parameters and overloading
kinan keshkeh
 
Score (smart contract for icon)
Score (smart contract for icon) Score (smart contract for icon)
Score (smart contract for icon)
Doyun Hwang
 
Tips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeTips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native code
Kenneth Geisshirt
 
Rcpp11 useR2014
Rcpp11 useR2014Rcpp11 useR2014
Rcpp11 useR2014
Romain Francois
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
confluent
 
Luis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio on RxJS
Luis Atencio on RxJS
Luis Atencio
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
Fabian Pedregosa
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
GeeksLab Odessa
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
Vyacheslav Arbuzov
 
C++ process new
C++ process newC++ process new
C++ process new
敬倫 林
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
Sasha Goldshtein
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
Dhafer Malouche
 

Similar to R and cpp (20)

Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itEvgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
R/C++ talk at earl 2014
R/C++ talk at earl 2014R/C++ talk at earl 2014
R/C++ talk at earl 2014
 
What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++
 
Extend R with Rcpp!!!
Extend R with Rcpp!!!Extend R with Rcpp!!!
Extend R with Rcpp!!!
 
2 BytesC++ course_2014_c3_ function basics&parameters and overloading
2 BytesC++ course_2014_c3_ function basics&parameters and overloading2 BytesC++ course_2014_c3_ function basics&parameters and overloading
2 BytesC++ course_2014_c3_ function basics&parameters and overloading
 
Score (smart contract for icon)
Score (smart contract for icon) Score (smart contract for icon)
Score (smart contract for icon)
 
Tips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeTips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native code
 
Rcpp11 useR2014
Rcpp11 useR2014Rcpp11 useR2014
Rcpp11 useR2014
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
 
Luis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio on RxJS
Luis Atencio on RxJS
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
C++ process new
C++ process newC++ process new
C++ process new
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 

More from Romain Francois

R/C++
R/C++R/C++
dplyr and torrents from cpasbien
dplyr and torrents from cpasbiendplyr and torrents from cpasbien
dplyr and torrents from cpasbien
Romain Francois
 
dplyr use case
dplyr use casedplyr use case
dplyr use case
Romain Francois
 
dplyr
dplyrdplyr
SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
Romain Francois
 
dplyr
dplyrdplyr
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
Romain Francois
 
Rcpp11 genentech
Rcpp11 genentechRcpp11 genentech
Rcpp11 genentech
Romain Francois
 
Rcpp11
Rcpp11Rcpp11
Integrating R with C++: Rcpp, RInside and RProtoBuf
Integrating R with C++: Rcpp, RInside and RProtoBufIntegrating R with C++: Rcpp, RInside and RProtoBuf
Integrating R with C++: Rcpp, RInside and RProtoBuf
Romain Francois
 
Object Oriented Design(s) in R
Object Oriented Design(s) in RObject Oriented Design(s) in R
Object Oriented Design(s) in R
Romain Francois
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
RProtoBuf: protocol buffers for R
RProtoBuf: protocol buffers for RRProtoBuf: protocol buffers for R
RProtoBuf: protocol buffers for RRomain Francois
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 

More from Romain Francois (18)

R/C++
R/C++R/C++
R/C++
 
dplyr and torrents from cpasbien
dplyr and torrents from cpasbiendplyr and torrents from cpasbien
dplyr and torrents from cpasbien
 
dplyr use case
dplyr use casedplyr use case
dplyr use case
 
dplyr
dplyrdplyr
dplyr
 
user2015 keynote talk
user2015 keynote talkuser2015 keynote talk
user2015 keynote talk
 
SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
 
dplyr
dplyrdplyr
dplyr
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Rcpp11 genentech
Rcpp11 genentechRcpp11 genentech
Rcpp11 genentech
 
Rcpp11
Rcpp11Rcpp11
Rcpp11
 
Rcpp attributes
Rcpp attributesRcpp attributes
Rcpp attributes
 
Rcpp is-ready
Rcpp is-readyRcpp is-ready
Rcpp is-ready
 
Rcpp
RcppRcpp
Rcpp
 
Integrating R with C++: Rcpp, RInside and RProtoBuf
Integrating R with C++: Rcpp, RInside and RProtoBufIntegrating R with C++: Rcpp, RInside and RProtoBuf
Integrating R with C++: Rcpp, RInside and RProtoBuf
 
Object Oriented Design(s) in R
Object Oriented Design(s) in RObject Oriented Design(s) in R
Object Oriented Design(s) in R
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
RProtoBuf: protocol buffers for R
RProtoBuf: protocol buffers for RRProtoBuf: protocol buffers for R
RProtoBuf: protocol buffers for R
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

R and cpp

  • 1. R and C++ Romain François ! romain@r-enthusiasts.com @romainfrancois
  • 5. 0.10.6 currently ! 0.10.7 out soon, and perhaps it will be called 0.11.0
  • 7. 96 896 lines of code (*.cpp + *.h)
  • 8.
  • 9. int add( int a, int b){ return a + b ; }
  • 10.
  • 11. #include <Rcpp.h> ! // [[Rcpp::export]] int add( int a, int b){ return a + b ; }
  • 12. A bridge between R and C++
  • 13. sourceCpp #include <Rcpp.h> ! // [[Rcpp::export]] int add( int a, int b){ return a + b ; } > sourceCpp( "add.cpp" ) > add( 1, 2 ) [1] 3
  • 14. R data structures • vectors: NumericVector, IntegerVector, … • lists : List • functions: Function • environments: Environment
  • 15. Key design decision Rcpp objects are proxy objects to the underlying R data structure
  • 16. Example: NumericVector // [[Rcpp::export]] double sum( NumericVector x){ int n = x.size() ; ! double res = 0.0 ; for( int i=0; i<n; i++){ res += x[i] ; } ! return res ; }
  • 17. Example: List List res = List::create( _["a"] = 1, _["b"] = "foo" ) ; res.attr( "class" ) = "myclass" ; ! int a = res["a"] ; res["b"] = 42 ;
  • 18. Example: Function Function rnorm( "rnorm" ) ; NumericVector x = rnorm( 10, _["mean"] = 30, _["sd"] = 100 ) ;
  • 19. Benchmark n <- length(x) m <- 0.0 for( i in 1:n ){ m <- m + x[i]^2 / n }
  • 21. Benchmark #include <Rcpp.h> using namespace Rcpp ; ! double square(x){ return x*x ; } ! // [[Rcpp::export]] double fun( NumericVector x){ int n = x.size() ; double res = 0.0 ; for( int i=0; i<n; i++){ res += square(x[i]) / n ; } return res ; }
  • 22. Benchmark Execution times (micro seconds) 10 000 100 000 1 000 000 Dumb R 1008 10 214 104 000 Vectorized R 24 125 1 021 C++ 13 80 709
  • 25. The usual bank account example class Account { private: double balance ; ! public: Account( ) : balance(0){} ! double get_balance(){ return balance ; } ! void withdraw(double x){ balance -= x ; } ! void deposit(double x ){ balance += x ; } } ; RCPP_MODULE(BankAccount){ class_<Account>( "Account" ) .constructor() ! .property( "balance", Account::get_balance ) ! .method( "deposit", Account::deposit) .method( "withdraw", Account::withdraw) ; } account <- new( Account ) account$deposit( 1000 ) account$balance account$withdraw( 200 ) account$balance account$balance <- 200
  • 27. dplyr
  • 28. dplyr • Package by Hadley Whickham • Plyr specialised for data frames: faster & with remote datastores • Great performance thanks to C++
  • 29. arrange ex: Arrange by year within each player arrange(Batting, playerID, yearID) Unit: milliseconds expr min lq df 186.64016 188.48495 dt 349.25496 352.12806 cpp 12.20485 13.85538 base 181.68259 182.58014 dt_raw 166.94213 170.15704 median 190.8989 357.4358 14.0081 184.6904 170.6418 uq 192.42140 403.45465 16.72979 186.33794 220.89911 max neval 195.36592 10 405.30055 10 23.95173 10 189.70377 10 223.42155 10
  • 30. filter Find the year for which each player played the most games filter(Batting, G == max(G)) Unit: milliseconds expr min lq median uq max neval df 371.96066 375.98652 380.92300 389.78870 430.2898 10 dt 47.37897 49.39681 51.23722 52.79181 95.8757 10 cpp 34.63382 35.27462 36.48151 38.30672 106.2422 10 base 141.81983 144.87670 147.36940 148.67299 173.8763 10
  • 31. summarise Compute the average number of at bats for each player summarise(x, ab = mean(AB)) Unit: microseconds expr min lq median uq max neval df 470726.569 475168.481 495500.076 498223.152 502601.494 10 dt 23002.422 23923.691 25888.191 28517.318 28683.864 10 cpp 756.265 820.921 838.529 864.624 950.079 10 base 253189.624 259167.496 263124.650 273097.845 326663.243 10 dt_raw 22462.560 23469.528 24438.422 25718.549 28385.158 10
  • 32. Vector Visitor Traversing an R vector of any type with the same interface class VectorVisitor { public: virtual ~VectorVisitor(){} /** hash the element of the visited vector at index i */ virtual size_t hash(int i) const = 0 ; /** are the elements at indices i and j equal */ virtual bool equal(int i, int j) const = 0 ; ! /** creates a new vector, of the same type as the visited vector, by * copying elements at the given indices */ virtual SEXP subset( const Rcpp::IntegerVector& index ) const = 0 ; ! }
  • 33. Vector Visitor inline VectorVisitor* visitor( SEXP vec ){ switch( TYPEOF(vec) ){ case INTSXP: if( Rf_inherits(vec, "factor" )) return new FactorVisitor( vec ) ; return new VectorVisitorImpl<INTSXP>( vec ) ; case REALSXP: if( Rf_inherits( vec, "Date" ) ) return new DateVisitor( vec ) ; if( Rf_inherits( vec, "POSIXct" ) ) return new POSIXctVisitor( vec ) ; return new VectorVisitorImpl<REALSXP>( vec ) ; case LGLSXP: return new VectorVisitorImpl<LGLSXP>( vec ) ; case STRSXP: return new VectorVisitorImpl<STRSXP>( vec ) ; default: break ; } // should not happen return 0 ; }
  • 34. Chunked evaluation ir <- group_by( iris, Species) summarise(ir, Sepal.Length = mean(Sepal.Length) ) • R expression to evaluate: mean(Sepal.Length) • Sepal.Length • dplyr knows mean. • fast and memory efficient algorithm ∊ iris
  • 35. Hybrid evaluation myfun <- function(x) x+x ir <- group_by( iris, Species) summarise(ir, xxx = mean(Sepal.Length) + min(Sepal.Width) - myfun(Sepal.Length) ) #1: fast evaluation of mean(Sepal.Length). 5.006 + min(Sepal.Width) - myfun(Sepal.Length) #2: fast evaluation of min(Sepal.Width). 5.006 + 3.428 - myfun(Sepal.Length) #3: fast evaluation of 5.006 + 3.428. 8.434 - myfun(Sepal.Length) #4: R evaluation of 8.434 - myfun(Sepal.Length).
  • 36. Hybrid Evaluation ! • mean, min, max, sum, sd, var, n, +, -, /, *, <, >, <=, >=, &&, || • packages can register their own hybrid evaluation handler. • See hybrid-evaluation vignette
  • 39. C++11 : lambda Lambda: function defined where used. Similar to apply functions in R. // [[Rcpp::export]] NumericVector foo( NumericVector v){ NumericVector res = sapply( v, [](double x){ return x*x; } ) ; return res ; }
  • 40. C++11 : for each loop C++98, C++03 std::vector<double> v ; for( int i=0; i<v.size(); v++){ double d = v[i] ; // do something with d } C++11 for( double d: v){ // do stuff with d }
  • 41. C++11 : init list C++98, C++03 NumericVector x = NumericVector::create( 1, 2 ) ; C++11 NumericVector x = {1, 2} ;
  • 42. Other changes • Move semantics : used under the hood in Rcpp11. Using less memory. • Less code bloat. Variadoic templates
  • 43. Rcpp11 article • I’m writing an article about C++11 • Explain the merits of C++11 • What’s next: C++14, C++17 • Goal is to make C++11 welcome on CRAN • https://github.com/romainfrancois/cpp11_article