Introducing
Revolution R Open
The Enhanced R Distribution
November 12, 2014
In today’s webinar:
R Update
Revolution R Open
The Reproducible R Toolkit
MRAN
Other open-source projects
• DeployR Open
• ParallelR
• Rhadoop
Revolution R Plus
Q&A
David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.com
Editor, blog.revolutionanalytics.com
Co-author, “Introduction to R”
3
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR PRODUCT
REVOLUTION R: The
enterprise-grade predictive
analytics application platform
based on the R language
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
What is R?
 Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
 Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
 Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
 Thriving open-source community
• Leading edge of analytics research
 Fills the Data Science talent gap
• New graduates prefer R
www.revolutionanalytics.com/what-is-r
5
Poll #1
What software do you use for statistical analysis? (Select all that apply.)
 R
 SAS
 SPSS
 Python
 Other
6
R’s popularity is growing rapidly
More at blog.revolutionanalytics.com/popularity
R Usage Growth
Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity
IEEE Spectrum Top Programming Languages
7
Revolution R Open is:
 Enhanced Open Source R distribution
 Compatible with all R-related software
 Multi-threaded for performance
 Focus on reproducibility
 Open source (GPLv2 license)
 Available for Windows, Mac OS X, Ubuntu,
Red Hat and OpenSUSE
 Download from
mran.revolutionanalytics.com
8
Multi-threaded performance
 Intel MKL replaces standard
BLAS/LAPACK algorithms
 Pipelined operations
– Optimized for Intel, works for all archs
 High-performance algorithms
 Sequential  Parallel
– Uses as many threads as there are
available cores
– Control with:
setMKLthreads(<value>)
 No need to change any R code
 Included in RRO binary distribution
More at Revolutions blog
9
100% Compatibility
 Built on latest R engine
– Currently R 3.1.1, R 3.1.2 in testing
 100% compatible with
– R scripts
– R packages
– Applications with R connections
 Designed to work with Rstudio
– No configuration required
 Replaces existing R application
– Side-by-side installations
Reproducibility – why do we care?
Academic / Research
 Verify results
 Advance Research
Business
 Production code
 Reliability
 Reusability
 Collaboration
 Regulation
10
www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
11
An R Reproducibility Problem
Adapted from http://xkcd.com/234/ CC BY-NC 2.5
12
Reproducible R Toolkit in RRO
 Static CRAN mirror
– CRAN packages fixed with each Revolution R Open update
 Daily CRAN snapshots
– Storing every package version since September 2014
– Binaries and sources
– At mran.revolutionanalytics.com/snapshot
 Easily write and share scripts synced to a specific snapshot
– “checkpoint” package installed with RRO
CRAN
RRDaily
snapshots
http://mran.revolutionanalytics.com/snapshot/
checkpoint
package
library(checkpoint)
checkpoint("2014-09-17")
CRAN mirror
http://cran.revolutionanalytics.com/
checkpoint
server
Midnight
UTC
13
Using checkpoint
 Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")
 For the package author:
– Use package versions available on the chosen date
– Installs packages local to this project
• Allows different package versions to be used simultaneously
 For a script collaborator:
– Automatically installs required packages
• Detects required packages (no need to manually install!)
– Uses same package versions as script author to ensure reproducibility
14
MRAN: The Managed R Archive Network
 Download Revolution R
Open
 Learn about R and RRO
 Daily CRAN snapshots
 Explore Packages
– and dependencies
 Explore Task Views
Revolution Analytics
Open Source Projects
More at projects.revolutionanalytics.com
16
DeployR Open
 Goal: embed results from R scripts into
existing applications, in real time
 Problem:
– Exposing arbitrary R functions is unwise
– Need to handle concurrent R sessions
 Solution: DeployR Open
– R, on a server, behind a firewall
– Repository Manager defines entry points
• Expose only authorized R functions
– Automatically creates Web Services APIs
– Manages and monitors pool of R sessions
– Separates roles for R and app developer
 DeployR Open: for prototyping integrations
– Revolution R Enterprise adds grid-scaling and
enterprise authentication
More at deployr.revolutionanalytics.com
17
DeployR : Integration
DeployR does not provide any application UI.
3 integration modes embed real-time R results into existing interfaces
Web app, mobile app, desktop app, BI tool, Excel, …
RBroker Framework (tutorial):
Simple, high-performance API for Java, .NET and Javascript apps
Supports transactional, on-demand analytics on a stateless R session
Client Libraries (tutorial):
Flexible control of R services from Java, .NET and Javascript apps
Also supports stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
Only available in Revolution R Enterprise DeployR
18
DeployR : Security / Scalability Layers
1. Anonymous execution
– Only authorized, user-defined R functions accessible
– No state preserved
2. Basic username / password authentication
– Managed in DeployR Administration Console
3. Enterprise Authentication
– Verifies identify with SSO / LDAP / Active Directory / PAM
4. Adaptive load-balancing grid
– Ensures service availability
DeployR Open demo
Fraud detection
19
20
RHadoop and ParallelR
 Toolkits for data scientists and numerical analysts to create custom
parallel and distributed algorithms
 ParallelR: parallel programming for multi-CPU servers and grids
 RHadoop: map-reduce programming in R language
 Mainly useful for “embarrassingly parallel” problems, where parallel
components work with small amounts of data
 Big Data Predictive Analytics mostly not embarrassingly parallel
 80+ pre-built “parallel external memory algorithms” included with
Revolution R Enterprise
21
RHadoop
 Collection of packages for interfacing R and Hadoop
 Client (desktop) R interface to Hadoop:
– rhdfs: Browse, read, write and modify files stored in HDFS
– rhbase: Browse, read, write and modify tables stored in HBASE
– ravro: Read, write and run map-reduce on Apache Avro files in HDFS
 R computations in Hadoop:
– rmr2: write map-reduce tasks in R to run in Hadoop
– plyrmr: R-based data manipulation computations on data in Hadoop
RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki
22
Word count in RHadoop
 Map:
– Input: lines of text
– Output: words with key value 1
 Reduce:
– Input: Words with several key values
– Output: words with counts
 Map-Reduce:
– Apply map to lines of text
– Gather like words together and count
Word count: execution
23
More: Video replay of “Using R with
Hadoop” by Jeffrey Breen
http://bit.ly/W35PLR
ParallelR
 foreach replaces for loops
– Minimal code change required
 Choice of parallel backends
– doParallel (base “parallel”)
– doMC (multi-core servers)
– doSNOW (grids)
 Iterations run in parallel
– Speedups depend on backend,
“granularity”
 All iterations run in-memory
24
birthday <- function(n) {
m <- 10000
x <- numeric(m)
for(i in 1:m) {
b <- sample(1:365, n, repl=T)
x[i] <- ifelse(length(unique(b))==n,0,1)
}
mean(x) # est prob of at least 1 match
}
for(j in 1:100) birthday(j)
library("doMC")
registerDoMC(2)
x <- foreach(j=1:100) %dopar% birthday(j)
2-core MacBook Air: 21.9s
2-core MacBook Air: 12.0s
Introducing
Revolution R Plus
26
Revolution R Plus includes:
 AdviseR™ Technical Support for:
– Revolution R Open
• Including R, base and recommended packages
– Reproducible R Toolkit
– ParallelR: Parallel programming with R
– RHadoop: R integration with Hadoop
– DeployR Open: Secure deployment of R to applications
 Open Source Assurance for all supported components
– Provides legal indemnity for subscribers
 Workstation subscriptions: $1,800 per year
– Server and Hadoop subscriptions also available
27
AdviseR™ Technical Support
Technical support for R, from the R experts.
 10x5 email and phone support (in your local time zone)
 Full support for R, validated packages, and third-party software
connections
 Notifications of updates and bug fixes
 On-line case management and knowledgebase
 Access to technical resources, documentation and user forums
 Defined service-level agreements for rapid responses
Included with Revolution R Plus and Revolution R Enterprise.
28
Open Source Assurance
 Revolution Analytics will defend Revolution R Plus subscribers should a
third party make an intellectual property claim against covered open
source software with respect to:
– copyrights, patents, trademarks, trade secrets
 Covered software includes:
– Revolution R Open (incl. R base and recommended packages), Reproducible R
Toolkit, DeployR Open, ParallelR, RHadoop
 Revolution Analytics will defend open source software in court
– If necessary, Revolution Analytics will obtain rights, modify, or replace software
found to be infringing
– If a resolution can’t be found, fees paid in past 12 months will be refunded.
29
The Revolution R Product Suite
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Revolution R Open
• Open-source distribution of R, packages, and other components
• Enhanced, supported and indemnified by Revolution Analytics
Revolution R Plus
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution Analytics
Revolution R Enterprise
Revolution R Enterprise (RRE)
The All-Inclusive Big Data Big Analytics Platform
DistributedR
DeployR DevelopR
ScaleR
ConnectR
High-performance open source R plus:
 Data source connectivity to big-data objects
 Big-data advanced analytics
 Multi-platform environment support
 In-Hadoop and in-Teradata predictive modeling
 Visual Studio IDE option
 Secure, Scalable R Deployment
 Technical support, training and services
– 24x7 support option
30
Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us
31
Poll #2
Which Revolution Analytics projects do you plan to use (or already use?)
Select all that apply:
1. Revolution R Open (free distribution)
2. Revolution R Plus (paid subscription for support and indemnification)
3. Reproducible R Toolkit (checkpoint package)
4. DeployR Open
5. Rhadoop / ParallelR
32
Wrapping up…
Revolution R Open is available now from
mran.revolutionanalytics.com/download
Explore Revolution Analytics open-source projects at
projects.revolutionanalytics.com
Technical support and open-source assurance with
Revolution R Plus
www.revolutionanalytics.com/plus
David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.co
m
Thank you.
Next up:
Batter Up! Advanced Sports Analytics with R and Storm
December 11, 2014
revolutionanalytics.com/webinars
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR

Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

  • 1.
    Introducing Revolution R Open TheEnhanced R Distribution November 12, 2014
  • 2.
    In today’s webinar: RUpdate Revolution R Open The Reproducible R Toolkit MRAN Other open-source projects • DeployR Open • ParallelR • Rhadoop Revolution R Plus Q&A David Smith Chief Community Officer Revolution Analytics @revodavid david@revolutionanalytics.com Editor, blog.revolutionanalytics.com Co-author, “Introduction to R”
  • 3.
    3 OUR COMPANY The leadingprovider of advanced analytics software and services based on open source R, since 2007 OUR PRODUCT REVOLUTION R: The enterprise-grade predictive analytics application platform based on the R language SOME KUDOS Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
  • 4.
    What is R? Most widely used data analysis software • Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language • Flexible, extensible and comprehensive for productivity  Create beautiful and unique data visualizations • As seen in New York Times, Twitter and Flowing Data  Thriving open-source community • Leading edge of analytics research  Fills the Data Science talent gap • New graduates prefer R www.revolutionanalytics.com/what-is-r
  • 5.
    5 Poll #1 What softwaredo you use for statistical analysis? (Select all that apply.)  R  SAS  SPSS  Python  Other
  • 6.
    6 R’s popularity isgrowing rapidly More at blog.revolutionanalytics.com/popularity R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  • 7.
    7 Revolution R Openis:  Enhanced Open Source R distribution  Compatible with all R-related software  Multi-threaded for performance  Focus on reproducibility  Open source (GPLv2 license)  Available for Windows, Mac OS X, Ubuntu, Red Hat and OpenSUSE  Download from mran.revolutionanalytics.com
  • 8.
    8 Multi-threaded performance  IntelMKL replaces standard BLAS/LAPACK algorithms  Pipelined operations – Optimized for Intel, works for all archs  High-performance algorithms  Sequential  Parallel – Uses as many threads as there are available cores – Control with: setMKLthreads(<value>)  No need to change any R code  Included in RRO binary distribution More at Revolutions blog
  • 9.
    9 100% Compatibility  Builton latest R engine – Currently R 3.1.1, R 3.1.2 in testing  100% compatible with – R scripts – R packages – Applications with R connections  Designed to work with Rstudio – No configuration required  Replaces existing R application – Side-by-side installations
  • 10.
    Reproducibility – whydo we care? Academic / Research  Verify results  Advance Research Business  Production code  Reliability  Reusability  Collaboration  Regulation 10 www.nytimes.com/2011/07/08/health/research/08genes.html http://arxiv.org/pdf/1010.1092.pdf
  • 11.
    11 An R ReproducibilityProblem Adapted from http://xkcd.com/234/ CC BY-NC 2.5
  • 12.
    12 Reproducible R Toolkitin RRO  Static CRAN mirror – CRAN packages fixed with each Revolution R Open update  Daily CRAN snapshots – Storing every package version since September 2014 – Binaries and sources – At mran.revolutionanalytics.com/snapshot  Easily write and share scripts synced to a specific snapshot – “checkpoint” package installed with RRO CRAN RRDaily snapshots http://mran.revolutionanalytics.com/snapshot/ checkpoint package library(checkpoint) checkpoint("2014-09-17") CRAN mirror http://cran.revolutionanalytics.com/ checkpoint server Midnight UTC
  • 13.
    13 Using checkpoint  Easyto use: add 2 lines to the top of each script library(checkpoint) checkpoint("2014-09-17")  For the package author: – Use package versions available on the chosen date – Installs packages local to this project • Allows different package versions to be used simultaneously  For a script collaborator: – Automatically installs required packages • Detects required packages (no need to manually install!) – Uses same package versions as script author to ensure reproducibility
  • 14.
    14 MRAN: The ManagedR Archive Network  Download Revolution R Open  Learn about R and RRO  Daily CRAN snapshots  Explore Packages – and dependencies  Explore Task Views
  • 15.
    Revolution Analytics Open SourceProjects More at projects.revolutionanalytics.com
  • 16.
    16 DeployR Open  Goal:embed results from R scripts into existing applications, in real time  Problem: – Exposing arbitrary R functions is unwise – Need to handle concurrent R sessions  Solution: DeployR Open – R, on a server, behind a firewall – Repository Manager defines entry points • Expose only authorized R functions – Automatically creates Web Services APIs – Manages and monitors pool of R sessions – Separates roles for R and app developer  DeployR Open: for prototyping integrations – Revolution R Enterprise adds grid-scaling and enterprise authentication More at deployr.revolutionanalytics.com
  • 17.
    17 DeployR : Integration DeployRdoes not provide any application UI. 3 integration modes embed real-time R results into existing interfaces Web app, mobile app, desktop app, BI tool, Excel, … RBroker Framework (tutorial): Simple, high-performance API for Java, .NET and Javascript apps Supports transactional, on-demand analytics on a stateless R session Client Libraries (tutorial): Flexible control of R services from Java, .NET and Javascript apps Also supports stateful R integrations (e.g. complex GUIs) DeployR Web Services API: Integrate R using almost any client languages
  • 18.
    Only available inRevolution R Enterprise DeployR 18 DeployR : Security / Scalability Layers 1. Anonymous execution – Only authorized, user-defined R functions accessible – No state preserved 2. Basic username / password authentication – Managed in DeployR Administration Console 3. Enterprise Authentication – Verifies identify with SSO / LDAP / Active Directory / PAM 4. Adaptive load-balancing grid – Ensures service availability
  • 19.
  • 20.
    20 RHadoop and ParallelR Toolkits for data scientists and numerical analysts to create custom parallel and distributed algorithms  ParallelR: parallel programming for multi-CPU servers and grids  RHadoop: map-reduce programming in R language  Mainly useful for “embarrassingly parallel” problems, where parallel components work with small amounts of data  Big Data Predictive Analytics mostly not embarrassingly parallel  80+ pre-built “parallel external memory algorithms” included with Revolution R Enterprise
  • 21.
    21 RHadoop  Collection ofpackages for interfacing R and Hadoop  Client (desktop) R interface to Hadoop: – rhdfs: Browse, read, write and modify files stored in HDFS – rhbase: Browse, read, write and modify tables stored in HBASE – ravro: Read, write and run map-reduce on Apache Avro files in HDFS  R computations in Hadoop: – rmr2: write map-reduce tasks in R to run in Hadoop – plyrmr: R-based data manipulation computations on data in Hadoop RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki
  • 22.
    22 Word count inRHadoop  Map: – Input: lines of text – Output: words with key value 1  Reduce: – Input: Words with several key values – Output: words with counts  Map-Reduce: – Apply map to lines of text – Gather like words together and count
  • 23.
    Word count: execution 23 More:Video replay of “Using R with Hadoop” by Jeffrey Breen http://bit.ly/W35PLR
  • 24.
    ParallelR  foreach replacesfor loops – Minimal code change required  Choice of parallel backends – doParallel (base “parallel”) – doMC (multi-core servers) – doSNOW (grids)  Iterations run in parallel – Speedups depend on backend, “granularity”  All iterations run in-memory 24 birthday <- function(n) { m <- 10000 x <- numeric(m) for(i in 1:m) { b <- sample(1:365, n, repl=T) x[i] <- ifelse(length(unique(b))==n,0,1) } mean(x) # est prob of at least 1 match } for(j in 1:100) birthday(j) library("doMC") registerDoMC(2) x <- foreach(j=1:100) %dopar% birthday(j) 2-core MacBook Air: 21.9s 2-core MacBook Air: 12.0s
  • 25.
  • 26.
    26 Revolution R Plusincludes:  AdviseR™ Technical Support for: – Revolution R Open • Including R, base and recommended packages – Reproducible R Toolkit – ParallelR: Parallel programming with R – RHadoop: R integration with Hadoop – DeployR Open: Secure deployment of R to applications  Open Source Assurance for all supported components – Provides legal indemnity for subscribers  Workstation subscriptions: $1,800 per year – Server and Hadoop subscriptions also available
  • 27.
    27 AdviseR™ Technical Support Technicalsupport for R, from the R experts.  10x5 email and phone support (in your local time zone)  Full support for R, validated packages, and third-party software connections  Notifications of updates and bug fixes  On-line case management and knowledgebase  Access to technical resources, documentation and user forums  Defined service-level agreements for rapid responses Included with Revolution R Plus and Revolution R Enterprise.
  • 28.
    28 Open Source Assurance Revolution Analytics will defend Revolution R Plus subscribers should a third party make an intellectual property claim against covered open source software with respect to: – copyrights, patents, trademarks, trade secrets  Covered software includes: – Revolution R Open (incl. R base and recommended packages), Reproducible R Toolkit, DeployR Open, ParallelR, RHadoop  Revolution Analytics will defend open source software in court – If necessary, Revolution Analytics will obtain rights, modify, or replace software found to be infringing – If a resolution can’t be found, fees paid in past 12 months will be refunded.
  • 29.
    29 The Revolution RProduct Suite • Free and open source R distribution • Enhanced and distributed by Revolution Analytics Revolution R Open • Open-source distribution of R, packages, and other components • Enhanced, supported and indemnified by Revolution Analytics Revolution R Plus • Secure, Scalable and Supported Distribution of R • With proprietary components created by Revolution Analytics Revolution R Enterprise
  • 30.
    Revolution R Enterprise(RRE) The All-Inclusive Big Data Big Analytics Platform DistributedR DeployR DevelopR ScaleR ConnectR High-performance open source R plus:  Data source connectivity to big-data objects  Big-data advanced analytics  Multi-platform environment support  In-Hadoop and in-Teradata predictive modeling  Visual Studio IDE option  Secure, Scalable R Deployment  Technical support, training and services – 24x7 support option 30 Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us
  • 31.
    31 Poll #2 Which RevolutionAnalytics projects do you plan to use (or already use?) Select all that apply: 1. Revolution R Open (free distribution) 2. Revolution R Plus (paid subscription for support and indemnification) 3. Reproducible R Toolkit (checkpoint package) 4. DeployR Open 5. Rhadoop / ParallelR
  • 32.
    32 Wrapping up… Revolution ROpen is available now from mran.revolutionanalytics.com/download Explore Revolution Analytics open-source projects at projects.revolutionanalytics.com Technical support and open-source assurance with Revolution R Plus www.revolutionanalytics.com/plus David Smith Chief Community Officer Revolution Analytics @revodavid david@revolutionanalytics.co m
  • 33.
    Thank you. Next up: BatterUp! Advanced Sports Analytics with R and Storm December 11, 2014 revolutionanalytics.com/webinars www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR