RenjinAlexander BertrambedatadrivenPurdue24.6.2012
Agenda•   About bedatadriven•   Motivation for Renjin•   Design•   Progress to-date•   Future Directions
About bedatadriven• Consulting and software company based in the  Netherlands with clients primarily in emerging  markets ...
Motivations for Developing Renjin• Deploy R programs to AppEngine & other  PaaS• Seamless integration with our Java codeba...
Renjin Design: Goals•   Run existing R packages without modification•   No (required) native dependencies•   Multithreaded...
Renjin Design: Approach• Parser is ported directly (via Bison-Java)• Simple AST-based interpreter closely modeled  after C...
Renjin Design: Threading JVM Process   RenjinEngine 1               RenjinEngine 2         Base                       Base...
Renjin Design: Embedding JVM Process                                   Hadoop FS                                   AppEngi...
Renjin Design: DataAll primitives written against (threadsafe)interfaces rather than arrays                    SequenceVec...
Progress to date• 68% of builtins/internals implemented• Runs complex packages out of the box (base,  survey, aspect)• 700...
Performance: Benchmark-25                      Runtime (% difference vs R2.14.2)                 Sorting of 7,000,000 rand...
Future Directions: Short-term• Testing: Complete test harness to  systematically evaluate completeness against  CRAN packa...
Future Directions: Longer-term• JIT compilation to JVM bytecode –  for, lapply, while, etc• Alternative backing stores for...
Compilation: Rely on an IRmean.online <- function(x) {           0:    xbar ← primitive<[>(x, 1.0)    xbar <- x[1]        ...
Contact & Thanks• http://code.google.com/p/renjin• alex@bedatadriven.comThanks to Renjin contributors:• M.Hakan Satman of ...
Upcoming SlideShare
Loading in...5
×

Renjin @ Purdue

1,142

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,142
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Most benchmarks from http://r.research.att.com/benchmarks/See SVN repo for source of all benchmarks @ r476Run with 3 warmup runs (not timed) and 5 timed runsUsing ATLAS blas libs on Ubuntu
  • Renjin @ Purdue

    1. 1. RenjinAlexander BertrambedatadrivenPurdue24.6.2012
    2. 2. Agenda• About bedatadriven• Motivation for Renjin• Design• Progress to-date• Future Directions
    3. 3. About bedatadriven• Consulting and software company based in the Netherlands with clients primarily in emerging markets and conflict zones• We use R extensively in: ▫ Predictive analytics for telecoms in Africa and Asia ▫ Web-based tools for market researchers ▫ Highly-clustered public health, opinion surveys in Afghanistan• Biggest datasets on order of 5-10 billion rows, small number of columns• Small datasets are also key when collection is expensive ($10-$20 / obs)
    4. 4. Motivations for Developing Renjin• Deploy R programs to AppEngine & other PaaS• Seamless integration with our Java codebases• Thread-based parallelization of machine- learning tasks• Simplify deployment in complex (read: chaotic) production environments of clients• Fascinating project!
    5. 5. Renjin Design: Goals• Run existing R packages without modification• No (required) native dependencies• Multithreaded• Security model that supports running unsafe code in webapp environments
    6. 6. Renjin Design: Approach• Parser is ported directly (via Bison-Java)• Simple AST-based interpreter closely modeled after C-R.• R-language portion of base library reused without modification• Leverage existing JVM libraries (commons- math, netlib, jtransforms) when possible• C/Fortran-language portions ported or rewritten in Java
    7. 7. Renjin Design: Threading JVM Process RenjinEngine 1 RenjinEngine 2 Base Base Stats Stats Global Dataframe Global
    8. 8. Renjin Design: Embedding JVM Process Hadoop FS AppEngine RenjinEngine 1 Virtual File BlobStore Base System Local File System Stats Security Global Manager
    9. 9. Renjin Design: DataAll primitives written against (threadsafe)interfaces rather than arrays SequenceVector (1:1e10)interface Vector ArrayVector JDBCVector BufferedSequential Vector TempFileBacked Vector DecoratingVector (inner + 1)
    10. 10. Progress to date• 68% of builtins/internals implemented• Runs complex packages out of the box (base, survey, aspect)• 700+ micro tests covering edges cases related to subscripts, promises, missing arguments, etc• Missing pieces: ▫ S4 object system ▫ Base graphics incomplete ▫ Stats package port incomplete ▫ No good strategy yet for native dependencies
    11. 11. Performance: Benchmark-25 Runtime (% difference vs R2.14.2) Sorting of 7,000,000 random values 1.17 2800x2800 cross-product matrix (b = a * a) 1.01Creation, transp., deformation of a 2500x2500 matrix 1.352400x2400 normal distributed random matrix ^1000 1.45 Creation of a 3000x3000 Hilbert matrix (matrix calc) 1.333,500,000 Fibonacci numbers calculation (vector calc) 2.38 Escoufiers method on a 45x45 matrix (mixed) 3.26 1.00 1.50 2.00 2.50 3.00 3.50
    12. 12. Future Directions: Short-term• Testing: Complete test harness to systematically evaluate completeness against CRAN packages• Dependency management: Augument/replace package management system with Aether to simplify mgmt of mixed R, Java, Scala artifacts• Command-line and eclipse plugins for ad hoc analysis
    13. 13. Future Directions: Longer-term• JIT compilation to JVM bytecode – for, lapply, while, etc• Alternative backing stores for Vector interface• GCC-bridge to translate existing C/Fortran sources to JVM byte code (Gimple->Shimple- >JVM bytecode)
    14. 14. Compilation: Rely on an IRmean.online <- function(x) { 0: xbar ← primitive<[>(x, 1.0) xbar <- x[1] 1: τ₃ ← Δ length(x)d 2: τ₄ ← Δ seq(2.0, τ₃) 3: Λ0 ← 0 for(n in seq(2,length(x)) { 4: τ₂ ← primitive<length>(τ₄) xbar <- ((n – 1) * L0 5: if Λ0 >= τ₂ goto L3 else L1 xbar + x[n]) / n L1 6: n ← τ₄[Λ0] } 7: τ₅ ← primitive<->(n, 1.0) xbar 8: τ₆ ← primitive<*>(τ₅, xbar)} 9: τ₇ ← primitive<[>(x, n) 10: τ₈ ← primitive<+>(τ₆, τ₇) 11: xbar ← primitive</>(τ₈, n) L2 12: Λ0 ← increment counter Λ0 13: goto L0 L3 14: return xbar
    15. 15. Contact & Thanks• http://code.google.com/p/renjin• alex@bedatadriven.comThanks to Renjin contributors:• M.Hakan Satman of Istanbul University, Department of Econometrics• Jamie Kingsbery , Yodle
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×