SlideShare a Scribd company logo
Revolution R Enterprise
Portland R User Group
November 13, 2013

David Smith @revodavid
Michael Helbraun
BIG
DATA
DATA
SCIENCE

BIG
DATA
OPEN SOURCE R
Innovate with R
 Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts

 Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity

 Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data

 Thriving open-source community
• Leading edge of analytics research

 Fills the talent gap
• New graduates prefer R

R is Hot
bit.ly/r-is-hot
WHITE PAPER
R is exploding in popularity & functionality
R Usage Growth
Rexer Data Miner Survey, 2007-2013
70% of data miners report using R

“I’ve been astonished by the rate at
which R has been adopted. Four years
ago, everyone in my economics
department [at the University of
Chicago] was using Stata; now, as far
as I can tell, R is the standard tool, and
students learn it first.”

Deputy Editor for New Products at Forbes

R is the first choice of more
data miners than any other
software

Source: www.rexeranalytics.com

“A key benefit of R is that it provides
near-instant availability of new and
experimental methods created by its
user base — without waiting for the
development/release cycle of
commercial software. SAS recognizes
the value of R to our customer base…”

Product Marketing Manager SAS Institute, Inc
Revolution R Enterprise

Power R for the
Enterprise

Supercharge R for
Massive Data

Empower Platform
Independence

Take Big Cost Out
of Big Data

7

is the Big Data Big Analytics
Platform
 Revolution R Enterprise includes all
of the components you need for:

– Enterprise readiness
– High performance analytics
– Multi-platform architecture
support
– Data source integration
– Development tools
– Deployment tools

8
The Platform Step by Step:
R Capabilities
R+CRAN

RevoR

• Open source R interpreter
• UPDATED R 3.0.2
• Freely-available R algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing
R scripts, functions and
packages

• Performance enhanced R interpreter
• Based on open source R
• Adds high-performance math

Available On:
•
•
•
•
•
•
•
•
•
•
•

PlatformTM LSFTM Linux®
Microsoft® HPC Clusters
Microsoft Azure Burst
Windows® & Linux Servers
Windows & Linux Workstations
Teradata® Database
IBM® Netezza®
IBM BigInsightsTM
Cloudera Hadoop®
Hortonworks Hadoop
Intel® Hadoop

9
The Platform Step by Step:
Parallelization & Data Sourcing

ConnectR
• High-speed data import/export

Available for:

• High-performance XDF
• SAS, SPSS, delimited & fixed format
text data files
• Hadoop HDFS & HBase
• Teradata Database TPT
• ODBC (incl. Vertica, Oracle, Pivotal,
Aster, SybaseIQ, DB2, MySQL)

ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical
tests
• Correlation & covariance matrices
• Predictive Models – linear, logistic,
GLM
• Machine learning
• Monte Carlo simulation
• NEW Tools for distributing
customized algorithms across nodes

DistributedR available on:

DistributedR
• Distributed computing framework
• Delivers portability across platforms

•
•
•
•
•
•
•
•

Windows Servers
Red Hat and NEW SuSE Linux Servers
IBM Platform LSF Linux Clusters
Microsoft HPC Clusters
Microsoft Azure Burst
NEW Teradata Database
NEW Cloudera Hadoop
NEW Hortonworks Hadoop

10
Powering Next Generation
Analytics

COMBINE INTERMEDIATE RESULTS

11
Eliminates Performance and Capacity
Limits of Open Source R and Legacy SAS
 Unique PEMAs: Parallel,
external-memory algorithms
 High-performance, scalable
replacements for R/SAS
analytic functions
 Parallel/distributed processing
eliminates CPU bottleneck
 Data streaming eliminates
memory size limitations
 Scales linearly with data size
and compute capacity
 Works with in-memory and
disk-based architectures
12
DEMO
USING BIG
DATA
PLATFORMS
Bringing R to Big Data Architectures

Servers &
Clusters

Hadoop









Data
Warehouses




Includes support for full suite of ScaleR
algorithms on platform
 Write Once, Deploy Anywhere
Teradata Database
Version 14.10

Cloudera & Hortonworks
Hadoop

Microsoft
& Linux
Servers

Workstations

Write Once
Deploy Anywhere

Server Clusters

16
Write Once  Deploy Anywhere
Set the desired compute context for code execution…..



rxSetComputeContext("local") # DEFAULT!!



rxSetComputeContext(RxLsfCluster(<data, server environment arguments>))




Local System
(default)

rxSetComputeContext(RxHpcServer(<data, server environment arguments>))
rxSetComputeContext(RxAzureBurst(<data, server environment arguments>))



rxSetComputeContext(RxHadoopMR(<data, server environment arguments>))



rxSetComputeContext(RxTeradata(<data, server environment arguments>))

Same code to be run anywhere …..

# Summarize and calculate descriptive statistics from the data airDS data set
adsSummary <- rxSummary(~ArrDelay+CRSDepTime+DayOfWeek, data = airDS)
# Fit Linear Model
arrDelayLm1 <- rxLinMod(ArrDelay ~ DayOfWeek, data = airDS); summary(arrDelayLm1)
A Simple Goal: Hadoop As An R Engine.
Hadoop

 Run Revolution R Enterprise code In Hadoop without
change
 Provide RRE ScaleR Pre-Parallelized Algorithms

 Eliminate:
 The need to “Think in MapReduce”
 The need for a separate compute cluster
 Data movement

18
RRE in Hadoop
HDFS
Name Node

MapReduce

Data Node

Data Node

Data Node

Data Node

Data Node

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Job
Tracker

19
RRE in Hadoop
HDFS
Name Node

MapReduce

Data Node

Data Node

Data Node

Data Node

Data Node

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Job
Tracker

20
DEMO
DEPLOYMENT

THE LAST MILE
PROBLEM
The Platform Step by Step:
Tools & Deployment
DevelopR

DeployR

• Integrated development
environment for R
• Visual „step-into‟ debugger

• Web services software
development kit for integration
analytics via Java, JavaScript or
.NET APIs
• Integrates R Into application
infrastructures

Available on:
• Windows
Or use:

DevelopR

DeployR

Capabilities:
• Invokes R Scripts from
web services calls
• RESTful interface for
easy integration
• Works with web & mobile apps,
leading BI & Visualization tools and
business rules engines

23
Custom Integration with Web Services API
Data Analysis

RRE DeployR makes R accessible
RRE DeployR
R / Statistical
Modeling Expert

Application
Developer

Business Intelligence

 Seamless
–

Bring the power of R to any web enabled application

Mobile Web Apps

 Simple
–

Web Services API leverages application development
frameworks including JS, Java, .NET

 Scalable
–

Robustly scale user and compute workloads

 Secure
–

Manage enterprise security with LDAP & SSO

Cloud / SaaS
App
Integration

25
Business Analysts: Alteryx

26
DEMO
With Thanks
 The R Core Team
 R developers (5000 packages on CRAN!)
 The R community
 You!

David Smith @revodavid
david@revolutionanalytics.com
Michael Helbraun
michael.helbraun@revolutionanalytics.com

www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR
Thank you.
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR
High Performance Big Data Analytics
with Revolution R Enterprise ScaleR

R Data Step

Descriptive
Statistics

Statistical
Tests

Sampling

Predictive
Modeling

Data
Visualization

Machine
Learning

Simulation
Revolution R Enterprise ScaleR: High
Performance Big Data Analytics
Data Prep, Distillation & Descriptive Analytics
R Data Step
 Data import – Delimited, Fixed,
SAS, SPSS, ODBC
 Variable creation & transformation
 Recode variables
 Factor variables
 Missing value handling
 Sort
 Merge
 Split
 Aggregate by category (means,
sums)

Descriptive
Statistics














Min / Max
Mean
Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product
matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data
(standard tables & long form)
Marginal Summaries of Cross
Tabulations

Statistical
Tests





Chi Square Test
Kendall Rank Correlation
Fisher‟s Exact Test
Student‟s t-Test

Sampling
 Subsample (observations &
variables)
 Random Sampling
Revolution R Enterprise ScaleR (continued)
Statistical Modeling
Predictive
Models
 Sum of Squares (cross product
matrix for set variables)
 Multiple Linear Regression
 Generalized Linear Models (GLM)
- All exponential family
distributions: binomial, Gaussian,
inverse Gaussian, Poisson,
Tweedie. Standard link functions
including: cauchit, identity, log,
logit, probit.
- User defined distributions & link
functions.
 Covariance Matrix
 Correlation Matrix
 Logistic Regression
 Classification & Regression Trees
 Residuals for all models

Data
Visualization






Histogram
Line Plot
Scatter Plot
Lorenz Curve
ROC Curves (actual data and
predicted values)
 NEW Tree Visualization

Machine Learning
Variable
Selection
 Stepwise Regression
 Linear
 NEW logistic
 NEW GLM

Simulation
 Monte Carlo

Cluster
Analysis
 K-Means

Classification
 Decision Trees
 NEW Decision Forests

Deployment
 Prediction (scoring)
 NEW PMML Export

More Related Content

What's hot

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
Revolution Analytics
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
Revolution Analytics
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
Revolution Analytics
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User Webinar
Revolution Analytics
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
Gregg Barrett
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
Revolution Analytics
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
Revolution Analytics
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
Revolution Analytics
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
Revolution Analytics
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
Data Science Thailand
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
Nicholas McClure
 
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
Revolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
Stéphane Fréchette
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics
 

What's hot (20)

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User Webinar
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Similar to Revolution R Enterprise - Portland R User Group, November 2013

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Revolution Analytics
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
BAINIDA
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
GapData Institute
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
MSDEVMTL
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
Revolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2Revolution Analytics
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
Great Wide Open
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
DataWorks Summit
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
DataWorks Summit/Hadoop Summit
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
Alex Palamides
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
Łukasz Grala
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Travis Wright
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Revolution Analytics
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
Bill Jacobs
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
Łukasz Grala
 
Ml2
Ml2Ml2

Similar to Revolution R Enterprise - Portland R User Group, November 2013 (20)

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Ml2
Ml2Ml2
Ml2
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
Revolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Revolution Analytics
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
Revolution Analytics
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
Revolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
Revolution Analytics
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
Revolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
Revolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
Revolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
Revolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
Revolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
Revolution Analytics
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Revolution Analytics
 

More from Revolution Analytics (15)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Revolution R Enterprise - Portland R User Group, November 2013

  • 1. Revolution R Enterprise Portland R User Group November 13, 2013 David Smith @revodavid Michael Helbraun
  • 5. Innovate with R  Most widely used data analysis software • Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language • Flexible, extensible and comprehensive for productivity  Create beautiful and unique data visualizations • As seen in New York Times, Twitter and Flowing Data  Thriving open-source community • Leading edge of analytics research  Fills the talent gap • New graduates prefer R R is Hot bit.ly/r-is-hot WHITE PAPER
  • 6. R is exploding in popularity & functionality R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R “I’ve been astonished by the rate at which R has been adopted. Four years ago, everyone in my economics department [at the University of Chicago] was using Stata; now, as far as I can tell, R is the standard tool, and students learn it first.” Deputy Editor for New Products at Forbes R is the first choice of more data miners than any other software Source: www.rexeranalytics.com “A key benefit of R is that it provides near-instant availability of new and experimental methods created by its user base — without waiting for the development/release cycle of commercial software. SAS recognizes the value of R to our customer base…” Product Marketing Manager SAS Institute, Inc
  • 7. Revolution R Enterprise Power R for the Enterprise Supercharge R for Massive Data Empower Platform Independence Take Big Cost Out of Big Data 7
  • 8. 
is the Big Data Big Analytics Platform  Revolution R Enterprise includes all of the components you need for: – Enterprise readiness – High performance analytics – Multi-platform architecture support – Data source integration – Development tools – Deployment tools 8
  • 9. The Platform Step by Step: R Capabilities R+CRAN RevoR • Open source R interpreter • UPDATED R 3.0.2 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Available On: • • • • • • • • • • • PlatformTM LSFTM Linux® Microsoft® HPC Clusters Microsoft Azure Burst Windows® & Linux Servers Windows & Linux Workstations Teradata® Database IBM® Netezza® IBM BigInsightsTM Cloudera Hadoop® Hortonworks Hadoop Intel® Hadoop 9
  • 10. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed data import/export Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS & HBase • Teradata Database TPT • ODBC (incl. Vertica, Oracle, Pivotal, Aster, SybaseIQ, DB2, MySQL) ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • NEW Tools for distributing customized algorithms across nodes DistributedR available on: DistributedR • Distributed computing framework • Delivers portability across platforms • • • • • • • • Windows Servers Red Hat and NEW SuSE Linux Servers IBM Platform LSF Linux Clusters Microsoft HPC Clusters Microsoft Azure Burst NEW Teradata Database NEW Cloudera Hadoop NEW Hortonworks Hadoop 10
  • 12. Eliminates Performance and Capacity Limits of Open Source R and Legacy SAS  Unique PEMAs: Parallel, external-memory algorithms  High-performance, scalable replacements for R/SAS analytic functions  Parallel/distributed processing eliminates CPU bottleneck  Data streaming eliminates memory size limitations  Scales linearly with data size and compute capacity  Works with in-memory and disk-based architectures 12
  • 13. DEMO
  • 15. Bringing R to Big Data Architectures Servers & Clusters Hadoop       Data Warehouses   Includes support for full suite of ScaleR algorithms on platform
  • 16.  Write Once, Deploy Anywhere Teradata Database Version 14.10 Cloudera & Hortonworks Hadoop Microsoft & Linux Servers Workstations Write Once Deploy Anywhere Server Clusters 16
  • 17. Write Once  Deploy Anywhere Set the desired compute context for code execution…..  rxSetComputeContext("local") # DEFAULT!!  rxSetComputeContext(RxLsfCluster(<data, server environment arguments>))   Local System (default) rxSetComputeContext(RxHpcServer(<data, server environment arguments>)) rxSetComputeContext(RxAzureBurst(<data, server environment arguments>))  rxSetComputeContext(RxHadoopMR(<data, server environment arguments>))  rxSetComputeContext(RxTeradata(<data, server environment arguments>)) Same code to be run anywhere ….. # Summarize and calculate descriptive statistics from the data airDS data set adsSummary <- rxSummary(~ArrDelay+CRSDepTime+DayOfWeek, data = airDS) # Fit Linear Model arrDelayLm1 <- rxLinMod(ArrDelay ~ DayOfWeek, data = airDS); summary(arrDelayLm1)
  • 18. A Simple Goal: Hadoop As An R Engine. Hadoop  Run Revolution R Enterprise code In Hadoop without change  Provide RRE ScaleR Pre-Parallelized Algorithms  Eliminate:  The need to “Think in MapReduce”  The need for a separate compute cluster  Data movement 18
  • 19. RRE in Hadoop HDFS Name Node MapReduce Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Tracker 19
  • 20. RRE in Hadoop HDFS Name Node MapReduce Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Tracker 20
  • 21. DEMO
  • 23. The Platform Step by Step: Tools & Deployment DevelopR DeployR • Integrated development environment for R • Visual „step-into‟ debugger • Web services software development kit for integration analytics via Java, JavaScript or .NET APIs • Integrates R Into application infrastructures Available on: • Windows Or use: DevelopR DeployR Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with web & mobile apps, leading BI & Visualization tools and business rules engines 23
  • 24. Custom Integration with Web Services API Data Analysis RRE DeployR makes R accessible RRE DeployR R / Statistical Modeling Expert Application Developer Business Intelligence  Seamless – Bring the power of R to any web enabled application Mobile Web Apps  Simple – Web Services API leverages application development frameworks including JS, Java, .NET  Scalable – Robustly scale user and compute workloads  Secure – Manage enterprise security with LDAP & SSO Cloud / SaaS
  • 27. DEMO
  • 28. With Thanks  The R Core Team  R developers (5000 packages on CRAN!)  The R community  You! David Smith @revodavid david@revolutionanalytics.com Michael Helbraun michael.helbraun@revolutionanalytics.com www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR
  • 30. High Performance Big Data Analytics with Revolution R Enterprise ScaleR R Data Step Descriptive Statistics Statistical Tests Sampling Predictive Modeling Data Visualization Machine Learning Simulation
  • 31. Revolution R Enterprise ScaleR: High Performance Big Data Analytics Data Prep, Distillation & Descriptive Analytics R Data Step  Data import – Delimited, Fixed, SAS, SPSS, ODBC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort  Merge  Split  Aggregate by category (means, sums) Descriptive Statistics              Min / Max Mean Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations Statistical Tests     Chi Square Test Kendall Rank Correlation Fisher‟s Exact Test Student‟s t-Test Sampling  Subsample (observations & variables)  Random Sampling
  • 32. Revolution R Enterprise ScaleR (continued) Statistical Modeling Predictive Models  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) - All exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions including: cauchit, identity, log, logit, probit. - User defined distributions & link functions.  Covariance Matrix  Correlation Matrix  Logistic Regression  Classification & Regression Trees  Residuals for all models Data Visualization      Histogram Line Plot Scatter Plot Lorenz Curve ROC Curves (actual data and predicted values)  NEW Tree Visualization Machine Learning Variable Selection  Stepwise Regression  Linear  NEW logistic  NEW GLM Simulation  Monte Carlo Cluster Analysis  K-Means Classification  Decision Trees  NEW Decision Forests Deployment  Prediction (scoring)  NEW PMML Export

Editor's Notes

  1. Enterprise readinessBuild assurance: Continuous testing, custom validationImplementation tools: validation utilityTechnical support, documentation, trainingPerformance architectureFast math librariesBetter memory managementMulti-core processingDistributed computing architectureBig Data analyticsDescriptive StatisticsCross TabulationStatistical TestsCorrelation, Covariance and SSCP MatricesLinear RegressionLogistic RegressionGeneralized Linear ModelsDecision TreesK-Means ClusteringData source integrationODBCTeradata (high speed)Text Files: Delimited &amp; Fixed formatSASSPSSHadoop:HDFS &amp; HbaseDevelopment toolsVisual DebuggerScript EditorR SnippetsObject BrowserSolution ExplorerCustomizable WorkspaceVersion Control Plug-InDeployment toolsR objects as JSON, XMLSupports Java, JavaScript, .NETRESTful web services APISecurity: LDAP, SSOBuilt-In load balancingAsynchronous schedulingManagement consoleAccelerators: Jaspersoft, Qlikview
  2. A Revolution R Enterprise ScaleR analytic is provided a data source as inputThe analytic loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0).Worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memoryWhen all of the data is processed a master results object is created from the intermediate results objects