Announcing: Release 7
Revolution R Enterprise
Tuesday, November 5

Michele Chambers, Chief Strategy Officer and VP Product Management
Thomas W. Dinsmore, Director of Product Management
Agenda
 Introduction
– Demystifying R
– Revolution Analytics at a Glance
– Revolution R Enterprise
– Revolution Analytics Partner Ecosystem

– Customer Testimonials

 What‟s New in RRE 7?
 More Information
 Questions
2
Demystifying R
 What is R & why is it so darn popular?
R is exploding in popularity & function
Internet Discussion

Package Growth

Mean monthly traffic on email discussion list

Number of R packages listed on CRAN

4,332 as of
Feb 2013

R

Stata
SAS
SPSS
S-Plus

Web Site Popularity

Scholarly Activity

Number of links to main web site

Google Scholar hits (’05-’09 CAGR)

R

R

SAS

SAS

SPSS

SPSS -27%

S-Plus

S-Plus

Stata

Stata

46%
-11%

0%
10%

4
Latest survey shows significant growth in R adoption
R Usage Growth
Rexer Data Miner Survey, 2007-2013
70% of data miners report using R

“I’ve been astonished by the rate at
which R has been adopted. Four years
ago, everyone in my economics
department [at the University of
Chicago] was using Stata; now, as far
as I can tell, R is the standard tool, and
students learn it first.”

Deputy Editor for New Products at Forbes

24% use R as primary tool

“A key benefit of R is that it provides
near-instant availability of new and
experimental methods created by its
user base — without waiting for the
development/release cycle of
commercial software. SAS recognizes
the value of R to our customer base…”

Source: www.rexeranalytics.com
Product Marketing Manager SAS Institute, Inc

5
Revolution Analytics at a Glance
Who We Are
Only provider of commercial big data big analytics platform based
on open source R statistical computing language

Customers
200+ Global 2000

Our Software Delivers

Global Presence
North America / EMEA / APAC

Scalable Performance: Distributed & parallelized analytics
Cross Platform: Write once, deploy anywhere
Productivity: Easily build & deploy with latest modern analytics

Our Services Deliver
Knowledge: Our experts enable you to be experts
Time-to-Value: Our Quickstart program gives you a jumpstart
Guidance: Our customer support team is here to help you

Global Industries Served
Financial Services
Digital Media
Government
Health & Life Sciences
High Tech
Manufacturing
Retail
Telco
6
Revolution R Enterprise
is….
the only big data big analytics platform
based on open source R
the defacto statistical computing language for
modern analytics
 High Performance, Scalable Analytics
 Portable Across Enterprise Platforms
 Easier to Build & Deploy Analytics

7
R is open source and drives analytic innovation
but….has some limitations for Enterprises

Big Data

In-memory bound

Hybrid memory & disk
scalability

Operates on bigger
volumes & factors

Speed of
Analysis

Single threaded

Parallel threading

Shrinks analysis time

Enterprise
Readiness

Community support

Commercial support

Delivers full service
production support

Analytic
Breadth &
Depth

5000+ innovative
analytic packages

Leverage open source
packages plus Big Data
ready packages

Supercharges R

Commercial
Viability

Risk of deployment of
open source

Commercial license

Eliminate risk with open
source
8
Introducing Revolution R Enterprise (RRE)
The Big Data Big Analytics Platform
 Big Data Big Analytics Ready
– Enterprise readiness
DevelopR
ConnectR
ScaleR
DistributedR

DeployR

– High performance analytics

– Multi-platform architecture
– Data source integration
– Development tools

– Deployment tools

9
The Platform Step by Step:
R Capabilities
R+CRAN

RevoR

•
•
•
•
•

• Performance enhanced R interpreter
• Based on open source R
• Adds high-performance math

Open source R interpreter
Freely-available R algorithms
Algorithms callable by RevoR
Embeddable in R scripts
100% Compatible with existing
R scripts, functions and
packages

10
The Platform Step by Step:
Parallelization & Data Sourcing

ConnectR
• High-speed & direct connectors

ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical
tests
• Correlation & covariance matrices
• Predictive Models – linear, logistic,
GLM
• Machine learning
• Monte Carlo simulation
• Tools for distributing customized
algorithms across nodes

DistributedR
• Distributed computing framework
• Delivers portability across platforms

11
The Platform Step by Step:
Tools & Deployment
DevelopR

DeployR

• Integrated development
environment for R
• Visual „step-into‟ debugger

• Web services software
development kit for integration
analytics via Java, JavaScript or
.NET APIs
• Integrates R Into application
infrastructures

DevelopR

DeployR

Capabilities:
• Invokes R Scripts from
web services calls
• RESTful interface for
easy integration
• Works with web & mobile apps,
leading BI & Visualization tools and
business rules engines

12
Write Once. Deploy Anywhere.
Hadoop

Hortonworks
Cloudera

EDW

IBM Netezza
Teradata

Clustered Systems

IBM Platform LSF
Microsoft HPC

Workstations & Servers

Desktop
Server

In the Cloud

Microsoft Azure Burst
Amazon AWS

DeployR
ConnectR
ScaleR
DistributedR

DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE

13
The Power of Revolution R Enterprise
Performance & Scalability
ScaleR
ScaleR

Moves computation to data

ScaleR

V
a
l
u
e

Moves computation to data

Leverage CRAN

ScaleR

Labor saving power

DistributedR

Maximizes computation

DistributedR

Powerful divide & conquer

DistributedR

Effective memory utilization

RevoR

3-50X faster

Open Source

Leverage latest innovation

14
Revolution R Enterprise
Powering Next Generation Analytics

COMBINE INTERMEDIATE RESULTS

15
Revolution R Enterprise Revo R
Performance Enhanced R
Open
Source R

Customers reportRevolution R
3-50x
Enterprise
performance improvements
compared to Open Source R —
without changing any code

Computation (4-core laptop)

Open Source R

Revolution R

Speedup

Matrix Multiply

176 sec

9.3 sec

18x

Cholesky Factorization

25.5 sec

1.3 sec

19x

Linear Discriminant Analysis

189 sec

74 sec

3x

R Benchmarks (Matrix Functions)

22 sec

3.5 sec

5x

R Benchmarks (Program Control)

5.6 sec

5.4 sec

Not appreciable

Linear Algebra1

General R Benchmarks2

1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/

16
RRE ScaleR outperforms SAS HPA – at a fraction of the cost
Logistic Regression:

Rows of data

1 billion

1 billion

Parameters

“just a few”

Double

7

Time

80 seconds

45%

44 seconds

Data location

In memory

Nodes

32

1/6th

5

Cores

384

5%

20

RAM

1,536 GB

5%

On disk

Revolution R is faster on the same amount of data, despite using approximately a
6th as many nodes, and not pre-loading data into RAM.

80 GB
20th

as many cores, a 20th as much RAM, a

Bottom Line: Revolution R Enterprise Performance = Greatly Reduced TCO
*As published by SAS in HPC Wire, April 21, 2011
17
R + Revolution R Enterprise
Unequaled Big Data Big Analytics
Deploy Analytics
Web, Mobile, Data Visualization, BI

Big Data Distributed Analytics

Big Data
Distributed
Analytics
Performance
Enhanced R

Performance Enhanced R

18
Revolution R Enterprise Ecosystem
Power of Integration
SI / Service

Deployment / Consumption

MSP / DSP

Advanced Analytics

ETL
Corios

Data / Infrastructure

19
Customers Revolutionize their Business

Power
4X performance
50M records scored daily
“We‟ve combined Revolution R
Enterprise and Hadoop to build and
deploy customized exploratory data
analysis and GAM survival models for
our marketing performance
management and attribution platform.
Given that our data sets are already in
the terabytes and are growing rapidly,
we depend on Revolution R Enterprise‟s
scalability and power – we saw about
a 4x performance improvement on 50
million records. It works brilliantly.”
- CEO, John Wallace, DataSong

Scalability

Performance

TB’s data from 200+ data sources
10’s thousands attributes
100’s millions of scores daily

2X data
2X attributes
no impact on performance

“We‟ve been able to scale our solution to a
problem that‟s so big that most companies could
not address it. If we had to go with a different
solution we wouldn‟t be as efficient as we are
now.”
- SVP Analytics, Kevin Lyons, eXelate

“We need a high-performance analytics
infrastructure because marketing optimization is a
lot like a financial trading. By watching the market
constantly for data or market condition updates,
we can now identify opportunities for our
clients that would otherwise be lost.”
- Chief Analytics Officer, Leon Zemel, [x+1]

20
What‟s New in RRE 7
The Power of R
Most widely used analytics tool
Preferred by working analysts
More than 6,000 packages
Global footprint
New
• R 3.0.2

22
Scalable Statistical Modeling
Linear Regression
Stepwise Linear
Logistic Regression
Generalized Linear Models
New
• Stepwise Logistic
• Stepwise GLM

23
Scalable Machine Learning
Decision Trees
New
• Decision Forests
• Tree Visualization

24
Data Source Integration
Fixed/delimited text
SAS, SPSS
ODBC
HDFS and HBase
Teradata
Tested
• HP Vertica
• Teradata Aster
25
New: Model Integration

26
BI Integration
Custom web reports
QlikView accelerator
New
• Excel Accelerator
• Tableau Integration

27
New: Business User Interface

28
Choice of Operating Systems

29
New: Inside-Hadoop Deployment

30
Multi-Node Package Manager
HDFS
Name Node

MapReduce

Data Node

Data Node

Data Node

Data Node

Data Node

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Job
Tracker

31
ScaleR in Hadoop
HDFS
Name Node

MapReduce

Data Node

Data Node

Data Node

Data Node

Data Node

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Job
Tracker

32
In-Database Deployment

33
Summary: What’s New in RRE 7.0
R 3.0.2

34
Summary: What’s New in RRE 7.0

Stepwise Logistic

Stepwise GLM
Decision Forests
Tree Visualizer

PMML Export

35
Summary: What’s New in RRE 7.0

36
Summary: What’s New in RRE 7.0

37
Summary: What’s New in RRE 7.0
DevelopR

DeployR

38
www.revolutionanalytics.com

39
40
www.revolutionanalytics.com/contact-us

41
42
43

05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Analytics Platform

  • 1.
    Announcing: Release 7 RevolutionR Enterprise Tuesday, November 5 Michele Chambers, Chief Strategy Officer and VP Product Management Thomas W. Dinsmore, Director of Product Management
  • 2.
    Agenda  Introduction – DemystifyingR – Revolution Analytics at a Glance – Revolution R Enterprise – Revolution Analytics Partner Ecosystem – Customer Testimonials  What‟s New in RRE 7?  More Information  Questions 2
  • 3.
    Demystifying R  Whatis R & why is it so darn popular?
  • 4.
    R is explodingin popularity & function Internet Discussion Package Growth Mean monthly traffic on email discussion list Number of R packages listed on CRAN 4,332 as of Feb 2013 R Stata SAS SPSS S-Plus Web Site Popularity Scholarly Activity Number of links to main web site Google Scholar hits (’05-’09 CAGR) R R SAS SAS SPSS SPSS -27% S-Plus S-Plus Stata Stata 46% -11% 0% 10% 4
  • 5.
    Latest survey showssignificant growth in R adoption R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R “I’ve been astonished by the rate at which R has been adopted. Four years ago, everyone in my economics department [at the University of Chicago] was using Stata; now, as far as I can tell, R is the standard tool, and students learn it first.” Deputy Editor for New Products at Forbes 24% use R as primary tool “A key benefit of R is that it provides near-instant availability of new and experimental methods created by its user base — without waiting for the development/release cycle of commercial software. SAS recognizes the value of R to our customer base…” Source: www.rexeranalytics.com Product Marketing Manager SAS Institute, Inc 5
  • 6.
    Revolution Analytics ata Glance Who We Are Only provider of commercial big data big analytics platform based on open source R statistical computing language Customers 200+ Global 2000 Our Software Delivers Global Presence North America / EMEA / APAC Scalable Performance: Distributed & parallelized analytics Cross Platform: Write once, deploy anywhere Productivity: Easily build & deploy with latest modern analytics Our Services Deliver Knowledge: Our experts enable you to be experts Time-to-Value: Our Quickstart program gives you a jumpstart Guidance: Our customer support team is here to help you Global Industries Served Financial Services Digital Media Government Health & Life Sciences High Tech Manufacturing Retail Telco 6
  • 7.
    Revolution R Enterprise is…. theonly big data big analytics platform based on open source R the defacto statistical computing language for modern analytics  High Performance, Scalable Analytics  Portable Across Enterprise Platforms  Easier to Build & Deploy Analytics 7
  • 8.
    R is opensource and drives analytic innovation but….has some limitations for Enterprises Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source Commercial license Eliminate risk with open source 8
  • 9.
    Introducing Revolution REnterprise (RRE) The Big Data Big Analytics Platform  Big Data Big Analytics Ready – Enterprise readiness DevelopR ConnectR ScaleR DistributedR DeployR – High performance analytics – Multi-platform architecture – Data source integration – Development tools – Deployment tools 9
  • 10.
    The Platform Stepby Step: R Capabilities R+CRAN RevoR • • • • • • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Open source R interpreter Freely-available R algorithms Algorithms callable by RevoR Embeddable in R scripts 100% Compatible with existing R scripts, functions and packages 10
  • 11.
    The Platform Stepby Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • Tools for distributing customized algorithms across nodes DistributedR • Distributed computing framework • Delivers portability across platforms 11
  • 12.
    The Platform Stepby Step: Tools & Deployment DevelopR DeployR • Integrated development environment for R • Visual „step-into‟ debugger • Web services software development kit for integration analytics via Java, JavaScript or .NET APIs • Integrates R Into application infrastructures DevelopR DeployR Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with web & mobile apps, leading BI & Visualization tools and business rules engines 12
  • 13.
    Write Once. DeployAnywhere. Hadoop Hortonworks Cloudera EDW IBM Netezza Teradata Clustered Systems IBM Platform LSF Microsoft HPC Workstations & Servers Desktop Server In the Cloud Microsoft Azure Burst Amazon AWS DeployR ConnectR ScaleR DistributedR DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE 13
  • 14.
    The Power ofRevolution R Enterprise Performance & Scalability ScaleR ScaleR Moves computation to data ScaleR V a l u e Moves computation to data Leverage CRAN ScaleR Labor saving power DistributedR Maximizes computation DistributedR Powerful divide & conquer DistributedR Effective memory utilization RevoR 3-50X faster Open Source Leverage latest innovation 14
  • 15.
    Revolution R Enterprise PoweringNext Generation Analytics COMBINE INTERMEDIATE RESULTS 15
  • 16.
    Revolution R EnterpriseRevo R Performance Enhanced R Open Source R Customers reportRevolution R 3-50x Enterprise performance improvements compared to Open Source R — without changing any code Computation (4-core laptop) Open Source R Revolution R Speedup Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable Linear Algebra1 General R Benchmarks2 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ 16
  • 17.
    RRE ScaleR outperformsSAS HPA – at a fraction of the cost Logistic Regression: Rows of data 1 billion 1 billion Parameters “just a few” Double 7 Time 80 seconds 45% 44 seconds Data location In memory Nodes 32 1/6th 5 Cores 384 5% 20 RAM 1,536 GB 5% On disk Revolution R is faster on the same amount of data, despite using approximately a 6th as many nodes, and not pre-loading data into RAM. 80 GB 20th as many cores, a 20th as much RAM, a Bottom Line: Revolution R Enterprise Performance = Greatly Reduced TCO *As published by SAS in HPC Wire, April 21, 2011 17
  • 18.
    R + RevolutionR Enterprise Unequaled Big Data Big Analytics Deploy Analytics Web, Mobile, Data Visualization, BI Big Data Distributed Analytics Big Data Distributed Analytics Performance Enhanced R Performance Enhanced R 18
  • 19.
    Revolution R EnterpriseEcosystem Power of Integration SI / Service Deployment / Consumption MSP / DSP Advanced Analytics ETL Corios Data / Infrastructure 19
  • 20.
    Customers Revolutionize theirBusiness Power 4X performance 50M records scored daily “We‟ve combined Revolution R Enterprise and Hadoop to build and deploy customized exploratory data analysis and GAM survival models for our marketing performance management and attribution platform. Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise‟s scalability and power – we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong Scalability Performance TB’s data from 200+ data sources 10’s thousands attributes 100’s millions of scores daily 2X data 2X attributes no impact on performance “We‟ve been able to scale our solution to a problem that‟s so big that most companies could not address it. If we had to go with a different solution we wouldn‟t be as efficient as we are now.” - SVP Analytics, Kevin Lyons, eXelate “We need a high-performance analytics infrastructure because marketing optimization is a lot like a financial trading. By watching the market constantly for data or market condition updates, we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1] 20
  • 21.
  • 22.
    The Power ofR Most widely used analytics tool Preferred by working analysts More than 6,000 packages Global footprint New • R 3.0.2 22
  • 23.
    Scalable Statistical Modeling LinearRegression Stepwise Linear Logistic Regression Generalized Linear Models New • Stepwise Logistic • Stepwise GLM 23
  • 24.
    Scalable Machine Learning DecisionTrees New • Decision Forests • Tree Visualization 24
  • 25.
    Data Source Integration Fixed/delimitedtext SAS, SPSS ODBC HDFS and HBase Teradata Tested • HP Vertica • Teradata Aster 25
  • 26.
  • 27.
    BI Integration Custom webreports QlikView accelerator New • Excel Accelerator • Tableau Integration 27
  • 28.
    New: Business UserInterface 28
  • 29.
  • 30.
  • 31.
    Multi-Node Package Manager HDFS NameNode MapReduce Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Tracker 31
  • 32.
    ScaleR in Hadoop HDFS NameNode MapReduce Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Tracker 32
  • 33.
  • 34.
    Summary: What’s Newin RRE 7.0 R 3.0.2 34
  • 35.
    Summary: What’s Newin RRE 7.0 Stepwise Logistic Stepwise GLM Decision Forests Tree Visualizer PMML Export 35
  • 36.
    Summary: What’s Newin RRE 7.0 36
  • 37.
    Summary: What’s Newin RRE 7.0 37
  • 38.
    Summary: What’s Newin RRE 7.0 DevelopR DeployR 38
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.

Editor's Notes

  • #8 Enterprise readinessPerformance architectureBig Data analyticsData source integrationDevelopment toolsDeployment tools
  • #9 RRE license is a combo of GPL v2 license (which guarantees commercial usage of R) plus a proprietary license to our proprietary components.
  • #10 Enterprise readinessBuild assurance: Continuous testing, custom validationImplementation tools: validation utilityTechnical support, documentation, trainingPerformance architectureFast math librariesBetter memory managementMulti-core processingDistributed computing architectureBig Data analyticsDescriptive StatisticsCross TabulationStatistical TestsCorrelation, Covariance and SSCP MatricesLinear RegressionLogistic RegressionGeneralized Linear ModelsDecision TreesK-Means ClusteringData source integrationODBCTeradata (high speed)Text Files: Delimited & Fixed formatSASSPSSHadoop:HDFS & HbaseDevelopment toolsVisual DebuggerScript EditorR SnippetsObject BrowserSolution ExplorerCustomizable WorkspaceVersion Control Plug-InDeployment toolsR objects as JSON, XMLSupports Java, JavaScript, .NETRESTful web services APISecurity: LDAP, SSOBuilt-In load balancingAsynchronous schedulingManagement consoleAccelerators: Jaspersoft, Qlikview
  • #16 A Revolution R Enterprise ScaleR analytic is provided a data source as inputThe analytic loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0).Worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memoryWhen all of the data is processed a master results object is created from the intermediate results objects
  • #23 Most current stable release~150 new featuresSupport for long vectors~100 bug fixes and performance improvements83 miscellaneous enhancements (installation, utilities, internationalization etc
  • #24 Semi-automatic modelingIdeal for variable selectionMethods:ForwardBackwardBidirectionalSelection criteria:AICBICMallows’ Cp
  • #25 “Random ForestsTM”Ensemble learning methodClassificationRegressionTrains many treesOutput is mode of classesVariety of use cases
  • #27 “Random ForestsTM”Ensemble learning methodClassificationRegressionTrains many treesOutput is mode of classesVariety of use cases
  • #30 “Random ForestsTM”Ensemble learning methodClassificationRegressionTrains many treesOutput is mode of classesVariety of use cases