SlideShare a Scribd company logo
1 of 33
Download to read offline
Introducing
Revolution R Open
The Enhanced R Distribution
November 12, 2014
In today’s webinar:
R Update
Revolution R Open
The Reproducible R Toolkit
MRAN
Other open-source projects
• DeployR Open
• ParallelR
• Rhadoop
Revolution R Plus
Q&A
David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.com
Editor, blog.revolutionanalytics.com
Co-author, “Introduction to R”
3
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR PRODUCT
REVOLUTION R: The
enterprise-grade predictive
analytics application platform
based on the R language
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
What is R?
 Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
 Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
 Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
 Thriving open-source community
• Leading edge of analytics research
 Fills the Data Science talent gap
• New graduates prefer R
www.revolutionanalytics.com/what-is-r
5
Poll #1
What software do you use for statistical analysis? (Select all that apply.)
 R
 SAS
 SPSS
 Python
 Other
6
R’s popularity is growing rapidly
More at blog.revolutionanalytics.com/popularity
R Usage Growth
Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity
IEEE Spectrum Top Programming Languages
7
Revolution R Open is:
 Enhanced Open Source R distribution
 Compatible with all R-related software
 Multi-threaded for performance
 Focus on reproducibility
 Open source (GPLv2 license)
 Available for Windows, Mac OS X, Ubuntu,
Red Hat and OpenSUSE
 Download from
mran.revolutionanalytics.com
8
Multi-threaded performance
 Intel MKL replaces standard
BLAS/LAPACK algorithms
 Pipelined operations
– Optimized for Intel, works for all archs
 High-performance algorithms
 Sequential  Parallel
– Uses as many threads as there are
available cores
– Control with:
setMKLthreads(<value>)
 No need to change any R code
 Included in RRO binary distribution
More at Revolutions blog
9
100% Compatibility
 Built on latest R engine
– Currently R 3.1.1, R 3.1.2 in testing
 100% compatible with
– R scripts
– R packages
– Applications with R connections
 Designed to work with Rstudio
– No configuration required
 Replaces existing R application
– Side-by-side installations
Reproducibility – why do we care?
Academic / Research
 Verify results
 Advance Research
Business
 Production code
 Reliability
 Reusability
 Collaboration
 Regulation
10
www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
11
An R Reproducibility Problem
Adapted from http://xkcd.com/234/ CC BY-NC 2.5
12
Reproducible R Toolkit in RRO
 Static CRAN mirror
– CRAN packages fixed with each Revolution R Open update
 Daily CRAN snapshots
– Storing every package version since September 2014
– Binaries and sources
– At mran.revolutionanalytics.com/snapshot
 Easily write and share scripts synced to a specific snapshot
– “checkpoint” package installed with RRO
CRAN
RRDaily
snapshots
http://mran.revolutionanalytics.com/snapshot/
checkpoint
package
library(checkpoint)
checkpoint("2014-09-17")
CRAN mirror
http://cran.revolutionanalytics.com/
checkpoint
server
Midnight
UTC
13
Using checkpoint
 Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")
 For the package author:
– Use package versions available on the chosen date
– Installs packages local to this project
• Allows different package versions to be used simultaneously
 For a script collaborator:
– Automatically installs required packages
• Detects required packages (no need to manually install!)
– Uses same package versions as script author to ensure reproducibility
14
MRAN: The Managed R Archive Network
 Download Revolution R
Open
 Learn about R and RRO
 Daily CRAN snapshots
 Explore Packages
– and dependencies
 Explore Task Views
Revolution Analytics
Open Source Projects
More at projects.revolutionanalytics.com
16
DeployR Open
 Goal: embed results from R scripts into
existing applications, in real time
 Problem:
– Exposing arbitrary R functions is unwise
– Need to handle concurrent R sessions
 Solution: DeployR Open
– R, on a server, behind a firewall
– Repository Manager defines entry points
• Expose only authorized R functions
– Automatically creates Web Services APIs
– Manages and monitors pool of R sessions
– Separates roles for R and app developer
 DeployR Open: for prototyping integrations
– Revolution R Enterprise adds grid-scaling and
enterprise authentication
More at deployr.revolutionanalytics.com
17
DeployR : Integration
DeployR does not provide any application UI.
3 integration modes embed real-time R results into existing interfaces
Web app, mobile app, desktop app, BI tool, Excel, …
RBroker Framework (tutorial):
Simple, high-performance API for Java, .NET and Javascript apps
Supports transactional, on-demand analytics on a stateless R session
Client Libraries (tutorial):
Flexible control of R services from Java, .NET and Javascript apps
Also supports stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
Only available in Revolution R Enterprise DeployR
18
DeployR : Security / Scalability Layers
1. Anonymous execution
– Only authorized, user-defined R functions accessible
– No state preserved
2. Basic username / password authentication
– Managed in DeployR Administration Console
3. Enterprise Authentication
– Verifies identify with SSO / LDAP / Active Directory / PAM
4. Adaptive load-balancing grid
– Ensures service availability
DeployR Open demo
Fraud detection
19
20
RHadoop and ParallelR
 Toolkits for data scientists and numerical analysts to create custom
parallel and distributed algorithms
 ParallelR: parallel programming for multi-CPU servers and grids
 RHadoop: map-reduce programming in R language
 Mainly useful for “embarrassingly parallel” problems, where parallel
components work with small amounts of data
 Big Data Predictive Analytics mostly not embarrassingly parallel
 80+ pre-built “parallel external memory algorithms” included with
Revolution R Enterprise
21
RHadoop
 Collection of packages for interfacing R and Hadoop
 Client (desktop) R interface to Hadoop:
– rhdfs: Browse, read, write and modify files stored in HDFS
– rhbase: Browse, read, write and modify tables stored in HBASE
– ravro: Read, write and run map-reduce on Apache Avro files in HDFS
 R computations in Hadoop:
– rmr2: write map-reduce tasks in R to run in Hadoop
– plyrmr: R-based data manipulation computations on data in Hadoop
RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki
22
Word count in RHadoop
 Map:
– Input: lines of text
– Output: words with key value 1
 Reduce:
– Input: Words with several key values
– Output: words with counts
 Map-Reduce:
– Apply map to lines of text
– Gather like words together and count
Word count: execution
23
More: Video replay of “Using R with
Hadoop” by Jeffrey Breen
http://bit.ly/W35PLR
ParallelR
 foreach replaces for loops
– Minimal code change required
 Choice of parallel backends
– doParallel (base “parallel”)
– doMC (multi-core servers)
– doSNOW (grids)
 Iterations run in parallel
– Speedups depend on backend,
“granularity”
 All iterations run in-memory
24
birthday <- function(n) {
m <- 10000
x <- numeric(m)
for(i in 1:m) {
b <- sample(1:365, n, repl=T)
x[i] <- ifelse(length(unique(b))==n,0,1)
}
mean(x) # est prob of at least 1 match
}
for(j in 1:100) birthday(j)
library("doMC")
registerDoMC(2)
x <- foreach(j=1:100) %dopar% birthday(j)
2-core MacBook Air: 21.9s
2-core MacBook Air: 12.0s
Introducing
Revolution R Plus
26
Revolution R Plus includes:
 AdviseR™ Technical Support for:
– Revolution R Open
• Including R, base and recommended packages
– Reproducible R Toolkit
– ParallelR: Parallel programming with R
– RHadoop: R integration with Hadoop
– DeployR Open: Secure deployment of R to applications
 Open Source Assurance for all supported components
– Provides legal indemnity for subscribers
 Workstation subscriptions: $1,800 per year
– Server and Hadoop subscriptions also available
27
AdviseR™ Technical Support
Technical support for R, from the R experts.
 10x5 email and phone support (in your local time zone)
 Full support for R, validated packages, and third-party software
connections
 Notifications of updates and bug fixes
 On-line case management and knowledgebase
 Access to technical resources, documentation and user forums
 Defined service-level agreements for rapid responses
Included with Revolution R Plus and Revolution R Enterprise.
28
Open Source Assurance
 Revolution Analytics will defend Revolution R Plus subscribers should a
third party make an intellectual property claim against covered open
source software with respect to:
– copyrights, patents, trademarks, trade secrets
 Covered software includes:
– Revolution R Open (incl. R base and recommended packages), Reproducible R
Toolkit, DeployR Open, ParallelR, RHadoop
 Revolution Analytics will defend open source software in court
– If necessary, Revolution Analytics will obtain rights, modify, or replace software
found to be infringing
– If a resolution can’t be found, fees paid in past 12 months will be refunded.
29
The Revolution R Product Suite
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Revolution R Open
• Open-source distribution of R, packages, and other components
• Enhanced, supported and indemnified by Revolution Analytics
Revolution R Plus
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution Analytics
Revolution R Enterprise
Revolution R Enterprise (RRE)
The All-Inclusive Big Data Big Analytics Platform
DistributedR
DeployR DevelopR
ScaleR
ConnectR
High-performance open source R plus:
 Data source connectivity to big-data objects
 Big-data advanced analytics
 Multi-platform environment support
 In-Hadoop and in-Teradata predictive modeling
 Visual Studio IDE option
 Secure, Scalable R Deployment
 Technical support, training and services
– 24x7 support option
30
Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us
31
Poll #2
Which Revolution Analytics projects do you plan to use (or already use?)
Select all that apply:
1. Revolution R Open (free distribution)
2. Revolution R Plus (paid subscription for support and indemnification)
3. Reproducible R Toolkit (checkpoint package)
4. DeployR Open
5. Rhadoop / ParallelR
32
Wrapping up…
Revolution R Open is available now from
mran.revolutionanalytics.com/download
Explore Revolution Analytics open-source projects at
projects.revolutionanalytics.com
Technical support and open-source assurance with
Revolution R Plus
www.revolutionanalytics.com/plus
David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.co
m
Thank you.
Next up:
Batter Up! Advanced Sports Analytics with R and Storm
December 11, 2014
revolutionanalytics.com/webinars
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR

More Related Content

What's hot

Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R OpenRevolution Analytics
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarRevolution Analytics
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 

What's hot (20)

Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar Presentation
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User Webinar
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 

Viewers also liked

Applications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the MarketplaceApplications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the MarketplaceRevolution Analytics
 
Integrating business intelligence and visualization tools into graduate techn...
Integrating business intelligence and visualization tools into graduate techn...Integrating business intelligence and visualization tools into graduate techn...
Integrating business intelligence and visualization tools into graduate techn...Maurice Dawson
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in Rarmstrtw
 
2015LETシンポジウム はじめに
2015LETシンポジウム はじめに2015LETシンポジウム はじめに
2015LETシンポジウム はじめにWritingMaetriX
 
SappoRo.R #3 LT: Shiny by RStudio
SappoRo.R #3 LT: Shiny by RStudioSappoRo.R #3 LT: Shiny by RStudio
SappoRo.R #3 LT: Shiny by RStudioMizumoto Atsushi
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Edureka!
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsLou Bajuk
 
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41stフリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41stkhcoder
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Herbert spencer
Herbert spencerHerbert spencer
Herbert spencerPam Green
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
R server and spark
R server and sparkR server and spark
R server and sparkBAINIDA
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 

Viewers also liked (20)

Applications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the MarketplaceApplications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the Marketplace
 
Integrating business intelligence and visualization tools into graduate techn...
Integrating business intelligence and visualization tools into graduate techn...Integrating business intelligence and visualization tools into graduate techn...
Integrating business intelligence and visualization tools into graduate techn...
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in R
 
2015LETシンポジウム はじめに
2015LETシンポジウム はじめに2015LETシンポジウム はじめに
2015LETシンポジウム はじめに
 
SappoRo.R #3 LT: Shiny by RStudio
SappoRo.R #3 LT: Shiny by RStudioSappoRo.R #3 LT: Shiny by RStudio
SappoRo.R #3 LT: Shiny by RStudio
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)
 
Spencer
SpencerSpencer
Spencer
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time Applications
 
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41stフリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Herbert spencer
Herbert spencerHerbert spencer
Herbert spencer
 
Edward Thorndike
Edward ThorndikeEdward Thorndike
Edward Thorndike
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
R server and spark
R server and sparkR server and spark
R server and spark
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
HERBERT SPENCER
HERBERT SPENCERHERBERT SPENCER
HERBERT SPENCER
 

Similar to Revolution R Open Webinar: Introducing Enhanced R Distribution

R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R ServicesGregg Barrett
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for HadoopWilly Marroquin (WillyDevNET)
 
Software Archaeology with RDz and RAA
Software Archaeology with RDz and RAASoftware Archaeology with RDz and RAA
Software Archaeology with RDz and RAAStrongback Consulting
 
Introduction to Microsoft R
Introduction to Microsoft RIntroduction to Microsoft R
Introduction to Microsoft RCheah Eng Soon
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial IntroductionSakthi Dasans
 
Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)Cheah Eng Soon
 
Revolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics
 
Christian Mladenov @ Intuitics
Christian Mladenov @ IntuiticsChristian Mladenov @ Intuitics
Christian Mladenov @ IntuiticsPAPIs.io
 
OSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk Hermann
OSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk HermannOSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk Hermann
OSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk HermannNETWAYS
 
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxRevolution Analytics
 
Building a web app on top of R (Slides from PAPIs 2014)
Building a web app on top of R (Slides from PAPIs 2014)Building a web app on top of R (Slides from PAPIs 2014)
Building a web app on top of R (Slides from PAPIs 2014)zhvihti
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on HadoopMing Yuan
 

Similar to Revolution R Open Webinar: Introducing Enhanced R Distribution (20)

R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
Software Archaeology with RDz and RAA
Software Archaeology with RDz and RAASoftware Archaeology with RDz and RAA
Software Archaeology with RDz and RAA
 
OwnR introduction
OwnR introductionOwnR introduction
OwnR introduction
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
Introduction to Microsoft R
Introduction to Microsoft RIntroduction to Microsoft R
Introduction to Microsoft R
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)
 
Revolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute history
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
Christian Mladenov @ Intuitics
Christian Mladenov @ IntuiticsChristian Mladenov @ Intuitics
Christian Mladenov @ Intuitics
 
OSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk Hermann
OSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk HermannOSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk Hermann
OSDC 2011 | RedHat Satellite - Einsatzweise und Möglichkeiten by Dirk Hermann
 
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
R presentation
R presentationR presentation
R presentation
 
Building a web app on top of R (Slides from PAPIs 2014)
Building a web app on top of R (Slides from PAPIs 2014)Building a web app on top of R (Slides from PAPIs 2014)
Building a web app on top of R (Slides from PAPIs 2014)
 
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 

More from Revolution Analytics (16)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 

Revolution R Open Webinar: Introducing Enhanced R Distribution

  • 1. Introducing Revolution R Open The Enhanced R Distribution November 12, 2014
  • 2. In today’s webinar: R Update Revolution R Open The Reproducible R Toolkit MRAN Other open-source projects • DeployR Open • ParallelR • Rhadoop Revolution R Plus Q&A David Smith Chief Community Officer Revolution Analytics @revodavid david@revolutionanalytics.com Editor, blog.revolutionanalytics.com Co-author, “Introduction to R”
  • 3. 3 OUR COMPANY The leading provider of advanced analytics software and services based on open source R, since 2007 OUR PRODUCT REVOLUTION R: The enterprise-grade predictive analytics application platform based on the R language SOME KUDOS Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
  • 4. What is R?  Most widely used data analysis software • Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language • Flexible, extensible and comprehensive for productivity  Create beautiful and unique data visualizations • As seen in New York Times, Twitter and Flowing Data  Thriving open-source community • Leading edge of analytics research  Fills the Data Science talent gap • New graduates prefer R www.revolutionanalytics.com/what-is-r
  • 5. 5 Poll #1 What software do you use for statistical analysis? (Select all that apply.)  R  SAS  SPSS  Python  Other
  • 6. 6 R’s popularity is growing rapidly More at blog.revolutionanalytics.com/popularity R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  • 7. 7 Revolution R Open is:  Enhanced Open Source R distribution  Compatible with all R-related software  Multi-threaded for performance  Focus on reproducibility  Open source (GPLv2 license)  Available for Windows, Mac OS X, Ubuntu, Red Hat and OpenSUSE  Download from mran.revolutionanalytics.com
  • 8. 8 Multi-threaded performance  Intel MKL replaces standard BLAS/LAPACK algorithms  Pipelined operations – Optimized for Intel, works for all archs  High-performance algorithms  Sequential  Parallel – Uses as many threads as there are available cores – Control with: setMKLthreads(<value>)  No need to change any R code  Included in RRO binary distribution More at Revolutions blog
  • 9. 9 100% Compatibility  Built on latest R engine – Currently R 3.1.1, R 3.1.2 in testing  100% compatible with – R scripts – R packages – Applications with R connections  Designed to work with Rstudio – No configuration required  Replaces existing R application – Side-by-side installations
  • 10. Reproducibility – why do we care? Academic / Research  Verify results  Advance Research Business  Production code  Reliability  Reusability  Collaboration  Regulation 10 www.nytimes.com/2011/07/08/health/research/08genes.html http://arxiv.org/pdf/1010.1092.pdf
  • 11. 11 An R Reproducibility Problem Adapted from http://xkcd.com/234/ CC BY-NC 2.5
  • 12. 12 Reproducible R Toolkit in RRO  Static CRAN mirror – CRAN packages fixed with each Revolution R Open update  Daily CRAN snapshots – Storing every package version since September 2014 – Binaries and sources – At mran.revolutionanalytics.com/snapshot  Easily write and share scripts synced to a specific snapshot – “checkpoint” package installed with RRO CRAN RRDaily snapshots http://mran.revolutionanalytics.com/snapshot/ checkpoint package library(checkpoint) checkpoint("2014-09-17") CRAN mirror http://cran.revolutionanalytics.com/ checkpoint server Midnight UTC
  • 13. 13 Using checkpoint  Easy to use: add 2 lines to the top of each script library(checkpoint) checkpoint("2014-09-17")  For the package author: – Use package versions available on the chosen date – Installs packages local to this project • Allows different package versions to be used simultaneously  For a script collaborator: – Automatically installs required packages • Detects required packages (no need to manually install!) – Uses same package versions as script author to ensure reproducibility
  • 14. 14 MRAN: The Managed R Archive Network  Download Revolution R Open  Learn about R and RRO  Daily CRAN snapshots  Explore Packages – and dependencies  Explore Task Views
  • 15. Revolution Analytics Open Source Projects More at projects.revolutionanalytics.com
  • 16. 16 DeployR Open  Goal: embed results from R scripts into existing applications, in real time  Problem: – Exposing arbitrary R functions is unwise – Need to handle concurrent R sessions  Solution: DeployR Open – R, on a server, behind a firewall – Repository Manager defines entry points • Expose only authorized R functions – Automatically creates Web Services APIs – Manages and monitors pool of R sessions – Separates roles for R and app developer  DeployR Open: for prototyping integrations – Revolution R Enterprise adds grid-scaling and enterprise authentication More at deployr.revolutionanalytics.com
  • 17. 17 DeployR : Integration DeployR does not provide any application UI. 3 integration modes embed real-time R results into existing interfaces Web app, mobile app, desktop app, BI tool, Excel, … RBroker Framework (tutorial): Simple, high-performance API for Java, .NET and Javascript apps Supports transactional, on-demand analytics on a stateless R session Client Libraries (tutorial): Flexible control of R services from Java, .NET and Javascript apps Also supports stateful R integrations (e.g. complex GUIs) DeployR Web Services API: Integrate R using almost any client languages
  • 18. Only available in Revolution R Enterprise DeployR 18 DeployR : Security / Scalability Layers 1. Anonymous execution – Only authorized, user-defined R functions accessible – No state preserved 2. Basic username / password authentication – Managed in DeployR Administration Console 3. Enterprise Authentication – Verifies identify with SSO / LDAP / Active Directory / PAM 4. Adaptive load-balancing grid – Ensures service availability
  • 19. DeployR Open demo Fraud detection 19
  • 20. 20 RHadoop and ParallelR  Toolkits for data scientists and numerical analysts to create custom parallel and distributed algorithms  ParallelR: parallel programming for multi-CPU servers and grids  RHadoop: map-reduce programming in R language  Mainly useful for “embarrassingly parallel” problems, where parallel components work with small amounts of data  Big Data Predictive Analytics mostly not embarrassingly parallel  80+ pre-built “parallel external memory algorithms” included with Revolution R Enterprise
  • 21. 21 RHadoop  Collection of packages for interfacing R and Hadoop  Client (desktop) R interface to Hadoop: – rhdfs: Browse, read, write and modify files stored in HDFS – rhbase: Browse, read, write and modify tables stored in HBASE – ravro: Read, write and run map-reduce on Apache Avro files in HDFS  R computations in Hadoop: – rmr2: write map-reduce tasks in R to run in Hadoop – plyrmr: R-based data manipulation computations on data in Hadoop RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki
  • 22. 22 Word count in RHadoop  Map: – Input: lines of text – Output: words with key value 1  Reduce: – Input: Words with several key values – Output: words with counts  Map-Reduce: – Apply map to lines of text – Gather like words together and count
  • 23. Word count: execution 23 More: Video replay of “Using R with Hadoop” by Jeffrey Breen http://bit.ly/W35PLR
  • 24. ParallelR  foreach replaces for loops – Minimal code change required  Choice of parallel backends – doParallel (base “parallel”) – doMC (multi-core servers) – doSNOW (grids)  Iterations run in parallel – Speedups depend on backend, “granularity”  All iterations run in-memory 24 birthday <- function(n) { m <- 10000 x <- numeric(m) for(i in 1:m) { b <- sample(1:365, n, repl=T) x[i] <- ifelse(length(unique(b))==n,0,1) } mean(x) # est prob of at least 1 match } for(j in 1:100) birthday(j) library("doMC") registerDoMC(2) x <- foreach(j=1:100) %dopar% birthday(j) 2-core MacBook Air: 21.9s 2-core MacBook Air: 12.0s
  • 26. 26 Revolution R Plus includes:  AdviseR™ Technical Support for: – Revolution R Open • Including R, base and recommended packages – Reproducible R Toolkit – ParallelR: Parallel programming with R – RHadoop: R integration with Hadoop – DeployR Open: Secure deployment of R to applications  Open Source Assurance for all supported components – Provides legal indemnity for subscribers  Workstation subscriptions: $1,800 per year – Server and Hadoop subscriptions also available
  • 27. 27 AdviseR™ Technical Support Technical support for R, from the R experts.  10x5 email and phone support (in your local time zone)  Full support for R, validated packages, and third-party software connections  Notifications of updates and bug fixes  On-line case management and knowledgebase  Access to technical resources, documentation and user forums  Defined service-level agreements for rapid responses Included with Revolution R Plus and Revolution R Enterprise.
  • 28. 28 Open Source Assurance  Revolution Analytics will defend Revolution R Plus subscribers should a third party make an intellectual property claim against covered open source software with respect to: – copyrights, patents, trademarks, trade secrets  Covered software includes: – Revolution R Open (incl. R base and recommended packages), Reproducible R Toolkit, DeployR Open, ParallelR, RHadoop  Revolution Analytics will defend open source software in court – If necessary, Revolution Analytics will obtain rights, modify, or replace software found to be infringing – If a resolution can’t be found, fees paid in past 12 months will be refunded.
  • 29. 29 The Revolution R Product Suite • Free and open source R distribution • Enhanced and distributed by Revolution Analytics Revolution R Open • Open-source distribution of R, packages, and other components • Enhanced, supported and indemnified by Revolution Analytics Revolution R Plus • Secure, Scalable and Supported Distribution of R • With proprietary components created by Revolution Analytics Revolution R Enterprise
  • 30. Revolution R Enterprise (RRE) The All-Inclusive Big Data Big Analytics Platform DistributedR DeployR DevelopR ScaleR ConnectR High-performance open source R plus:  Data source connectivity to big-data objects  Big-data advanced analytics  Multi-platform environment support  In-Hadoop and in-Teradata predictive modeling  Visual Studio IDE option  Secure, Scalable R Deployment  Technical support, training and services – 24x7 support option 30 Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us
  • 31. 31 Poll #2 Which Revolution Analytics projects do you plan to use (or already use?) Select all that apply: 1. Revolution R Open (free distribution) 2. Revolution R Plus (paid subscription for support and indemnification) 3. Reproducible R Toolkit (checkpoint package) 4. DeployR Open 5. Rhadoop / ParallelR
  • 32. 32 Wrapping up… Revolution R Open is available now from mran.revolutionanalytics.com/download Explore Revolution Analytics open-source projects at projects.revolutionanalytics.com Technical support and open-source assurance with Revolution R Plus www.revolutionanalytics.com/plus David Smith Chief Community Officer Revolution Analytics @revodavid david@revolutionanalytics.co m
  • 33. Thank you. Next up: Batter Up! Advanced Sports Analytics with R and Storm December 11, 2014 revolutionanalytics.com/webinars www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR