SlideShare a Scribd company logo
1 of 38
Download to read offline
R and Hadoop:
Architectural Options
Bill Jacobs
VP Product Marketing & Field CTO, Revolution
Analytics
@bill_jacobs
Polling Question #1:
 Who Are You? (choose one)
– Statistician or modeler who uses R
– Other R developer
– Hadoop Expert
– Application builder
– Data guru
– Business user
– Systems vendor or reseller
– Something else…
• Challenges
• Options
• Considerations
• How to Choose
Agenda
Boundless Opportunities
 Marketing: Clickstream &
Campaign Analyses
 Digital Media:
Recommendation Engines
 Retail: Social Sentiment
Analysis
 Insurance: Fraud Waste and
Abuse
 Healthcare Delivery: Outcome
Prediction
 Manufacturing: Quality
Optimization
 P&C Insurance: Risk Analysis
 Consumer Products: Warranty
Optimization
 Operations: Supply Chain
Optimization
 Econometrics: Market
Prediction
 Marketing: Mix and Price
Optimization
 Life Sciences:
Pharmacogenetics
 Transportation: Asset
Utilization
Polling Question #2:
 What Industry Do You Represent?
– Financial Services
– Insurance
– Healthcare, Life Sciences or Pharma
– Manufacturing
– Energy
– Retail
– Logistics and Transportation
– Education
– Government
– Marketing & Advertising
– Technology
– Other
In A Perfect World…
Analytical Capability
Compute
Data Scale
UsersPrice
Ease
Security
Hadoop Analytics - Many Alternatives
 R Based Alternatives
 Legacy tools updated – SAS HPA, etc.
 Big Data Databases
 Other Languages – Scala, Java, Julia, various GUIs
Today’s Topic:
 R-Based Alternatives
– “Beside Architectures”
– “Inside Architectures”
– Open Source and Commercial
Reality: Tradeoffs.
Memory Limits
In-Memory vs. Shared Infrastructure
CRAN vs. Parallelization
Desktop vs. Remote
Explicit vs. Automatic Distribution
Locality vs. Movement
Real-Time vs. MapReduce
Traditional Statistics vs. Machine Learning
No Magic Bullet.
Corporate Overview & Quick Facts
Founded 2008 (as REvolution
Computing)
Office Locations Palo Alto (HQ), Seattle
(Engineering)
Singapore
London
CEO David Rich
Number of
customers
200+
Investors • Northbridge Venture Partners
• Intel Capital
• Platform Vendor
Web site: • www.revolutionanalytics.com
Revolution R Enterprise is the leading commercial analytics platform based on
the open source R statistical computing language
Revolution Analytics
Our Vision:
R becomes the de-
facto standard for
enterprise predictive
analytics
Our Mission:
Drive enterprise
adoption of R by
providing enhanced R
products tailored to
meet enterprise
challenges
Revolution Analytics Builds & Delivers:
 Software Products:
 Stable Distributions
 Broad Platform Support
 Big Data Analytics in R
 Application Integration
 Deployment Platforms
 Agile Development Tooling
 Future Platform Support
 Support & Services
 Commercial Support Programs
 Training Programs
 Professional Services
 Community Programs
 Academic Support Programs
 Contributions to Open Source R
 Open Source Extensions
 Sponsorship of R User Groups
Revolution Analytics Technical Innovations
 R Options from Open Source
to Enterprise
 Parallelized Analytical
Computation
 In-Database & In-Hadoop
Analytics
 Big Data Scalability
 Remote Execution
 Production Deployment
Support
 Multi-Platform Deployment
 Legacy Data Format Support
 Multiple IDE Options
 PMML Model Export
The Revolution R Product Suite
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Revolution R Open
• Open-source distribution of R, packages, and other components
• Enhanced, supported and indemnified by Revolution Analytics
Revolution R Plus
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution Analytics
Revolution R Enterprise
Polling Question #3:
 State Play: In your company you are…
– Building Our “Data Lake”
– Running R + Hadoop Data Today
– Running R inside Hadoop using Open source
– Running RRE inside Hadoop
– Deploying Business Apps. Using Analytics from Hadoop Data
– Looking at Next Steps e.g. Spark, etc.
Revolution Analytics:
Eight Alternatives for Integrating R & Hadoop
Open Source
1. Open Source R
2. Revolution R Open
3. Open Source Parallelization on Workstations & Servers
4. rHadoop: Open Source Parallelization with rHadoop
Commercial
5. Revolution R Enterprise on Servers & Workstations
6. Revolution R Enterprise on Edge Nodes
7. Revolution R Enterprise Inside Hadoop
8. Combined Edge Node & Inside Hadoop
1. Open Source R Integrated With Hadoop
• Traditional
Open Source
• Memory-
Limited
• Data Moves
Traditional Open Source R “Beside” Architecture:
CRAN
Algorithms
rHDFS
rHbas
e
rHive
rODB
C
2. Revolution R Open On Workstations & Servers
Replace Open Source R “Beside” Architecture with Revolution R Open
As with Open Source R:
• Still Free.
• Still Memory Based.
• Data Still Moves.
Improvements:
• Accelerates Math
with Intel MKL
• Improves R-based
packages
Limitations
• No Effect
for non-R Code
CRAN
Algorithms
rHDFS
rHbas
e
rHive
rODB
C
Accelerate R Math with Intel Math Kernel Lib’s.
Source: http://blog.revolutionanalytics.com/2014/10/revolution-r-open-mkl.html
3. Write Parallel Algorithms PC, Server or Clusters
Write R Code to Explicitly Parallelize – Deploy Across Several Systems
Can Include CRAN
Algorithms “Carefully”
ForEach & Iterator
• DoParallel (PC, server)
• DoMPI (cluster)
• RRE RxEXEC
Example Uses:
• Bootstrapping
• Simulation
• HPC
rHDFS
rHbas
e
rHive
rODB
C
As with Previous:
• Still Free.
• Still Memory Based.
• Data Still Moves.
• Intel MKL with RRO
Improvements:
• Parallelized Execution
Limitations:
• Parallelization Difficulty
• Data Movement
• Platform Specific
4. rHadoop: Custom Parallel Execution for Hadoop
Remote
Desktop
R Code
Execute R Code & CRAN Algorithms Inside Hadoop
Example Uses:
• Scoring
• Transformation
• Easily Parallelized
Algorithms
Hadoop
Streaming
Can Include CRAN
Algorithms
As With Previous:
 Still Free.
 Optional Intel MKL
in RRO
Improvements:
 Runs R in
MapReduce
 No Data Movement
Limitations:
 Manual
Parallelization
 Hadoop Specific
rHbase
rHDFS
rMapReduce
5. Revolution R Enterprise (RRE) PEMAs inside
Hadoop
Traditional “Beside” Architecture with Optimized Algorithms
Available for Windows, Linux As With Previous:
 Includes Intel MKL in RRO
Advantages
 Speed: PEMAs Parallelize
Across Threads, Cores &
Sockets
 Scale: PEMAs “Chunk” -
no Memory Limits
 All of CRAN Available
 Portability
 Fully Supported
Limitations:
 Data Movement
 Single Machine
Revolution R Enterprise:
• ScaleR PEMA
Algorithms
plus
• All of CRAN
(subject to memory limits)
rHDFS
rHbas
e
rHive
rODB
C
Revolution R Enterprise
 High Performance, Scalable Analytics
 Portable Across Enterprise Platforms
 Easier to Build & Deploy Analytics
is….
the only big data big analytics platform
based on open source R
ScaleR
Refactor Algorithms for Dramatic Performance and Capacity Improvement
ScaleR
High Performance Algorithms for the Most Common Uses
 Data import – Delimited, Fixed, SAS, SPSS,
OBDC
 Variable creation & transformation
 Recode variables
 Factor variables
 Missing value handling
 Sort, Merge, Split
 Aggregate by category (means, sums)
 Min / Max, Mean, Median (approx.)
 Quantiles (approx.)
 Standard Deviation
 Variance
 Correlation
 Covariance
 Sum of Squares (cross product matrix for set
variables)
 Pairwise Cross tabs
 Risk Ratio & Odds Ratio
 Cross-Tabulation of Data (standard tables & long
form)
 Marginal Summaries of Cross Tabulations
 Chi Square Test
 Kendall Rank Correlation
 Fisher’s Exact Test
 Student’s t-Test
 Subsample (observations & variables)
 Random Sampling
Data Step Statistical Tests
Sampling
Descriptive Statistics
 Sum of Squares (cross product matrix for set
variables)
 Multiple Linear Regression
 Generalized Linear Models (GLM) exponential
family distributions: binomial, Gaussian, inverse
Gaussian, Poisson, Tweedie. Standard link
functions: cauchit, identity, log, logit, probit. User
defined distributions & link functions.
 Covariance & Correlation Matrices
 Logistic Regression
 Classification & Regression Trees
 Predictions/scoring for models
 Residuals for all models
Predictive Models
 K-Means
 Decision Trees
 Decision Forests
 Gradient Boosted Decision Trees
Cluster Analysis
Classification
Simulation
Variable Selection
 Stepwise Regression
 Simulation (e.g. Monte Carlo)
 Parallel Random Number Generation
Combination
25Revolution Analytics Confidential – Under NDA
New in
7.3
 PEMA-R API
 rxDataStep
 rxExec
ScaleR PEMA
What’s a PEMA?
Parallel External Memory Algorithms
Master
Algorithm
Process
Data
Analyze Each
Block
• Not Limited to Available
Memory
• Unlimited Data Scale
• Ingests Data One Chunk
At A Time.
• Adjustable Memory
Footprint
• Multi-Thread Execution
Performance
• Highly-Optimized
Algorithms
• Algorithm Math Fully
Refactored for Parallelism
• Delivered as ScaleR
Library in Revolution R
Enterprise
Load Block At A
Time
Combine
Individual
Results
Script Calls
ScaleR
Algorithm
Scripts can call CRAN Open
Source Algorithms
Start & Manage
Processing
rHDFS
rHbas
e
rHive
rODB
C
6. Run Revolution R Enterprise on Hadoop
Edge Node(s)
Local
File
System
(opt.)
ScaleR + CRAN
Algorithms
Fast Single-Server Alternative for Modest Data Scale
Edge
NodeThin Client or
Remote
Desktop
As With Previous:
 Single Machine Execution
 PEMA Scale & Speed (Single
Machine)
 Use ScaleR + CRAN
 Accelerate R with Intel MKL
Improvements:
 Easily Shared via
 No Data Movement
 Develop on Desktop Run on
Edge Node
Limitations:
 “Shorter Trip” for Data
7. Fast, Transparent Parallel Computation
Inside Hadoop YARN/MapReduce
jobtracker
ScaleR
Algorithms
DeployR
Fast Parallelized Analytics on Large Data Sets In Hadoop
As With Previous:
 Speed and Scale of ScaleR PEMA
Algorithms
 Use CRAN Where Appropriate
 Accelerate R Math with MKL
 Custom Parallelized Algo’s
Advantages
 Parallel Computation
 No Data Movement
 ScaleR PEMA Parallelization
 Can Parallelize CRAN “Carefully”
 Portable Coding
Limitations:
 Hadoop Workload Profiles
We
b
Ser
vice
s
Web
Services
Remote
Execution
Desktop & Server
Tools and
Applications
29
One Client’s Experience with RRE on Hadoop
Test Cluster - 9 Nodes
Task Processing Time
Importing and Filtering Datasets from
HDFS
14 Million Observations 82 sec.
227 Million Observations 310 sec.
Modeling and Estimation
1.2 M Correlations 2771 sec.
Simple Linear Regression, 227 M
Observations 61 sec.
Multiple Linear Regression, Three
Variables, 227 M Observations 58 sec.
Multiple Linear Regression, Four
Variables, 227 M Observations 58 sec.
Random Forest, 10 Predictor Variables,
227 M Observations, 10 Trees with Max
Depth of 10 Splits 2 hr. 3 min.
64GB
24 cores
each
9 Task
Nodes
2 Admin
Nodes1 Edge
Node
128GB
24 cores
each
128GB
24 cores
each
8. Combined Edge Node & In-Hadoop
ScaleR
Algorithms
DeployR
Maximized Flexibility, Performance & Workload Handling
As With Previous:
 Speed and Scale of ScaleR PEMA
Algorithms
 Use CRAN Where Appropriate
 Accelerate R Math with MKL
 Custom Parallelized Algo’s
Advantages
 Flexibility for Blended Workloads
 Little or No Data Movement
 Maximize CRAN Capabilities by
Sharing Large RAM Edge Nodes
We
b
Ser
vice
s
Thin Client
Development
Remote
Execution
Desktop & Server
Tools and
Applications
rStudio
Occasionally
Conflicting Criteria
Infrastructure Criteria:
 Big Data Platform
 Vendor Choice
 Data Ingest
 Data Security
 Data Governance
Data Science Criteria:
 Performance
 Self Service
 Flexibility
 Collaboration
 Sharing
 Capability
Key Questions:
 Where are the bulk of your skills? SAS? R? Java? Python? SQL?
 Where do you build models today?
 Do you have the skills to parallelize algorithms?
 Can models be built on a big shared server?
 How will you run models?
 Do you have the budget to purchase commercial solutions?
 How will your needs change over time?
 What is your future architecture plan?
 How risk averse is your management team regarding new platforms and
open source?
Key Questions (cont.)
 What Workloads Do You Anticipate?
— How May Users?
— What Workloads?
 Workload Realities:
— Many small tasks do not run well
in MapReduce
— Large data movements /
duplications are costly
 What Use Cases Will You
Encounter?
— Traditional statistical
exploration, modeling?
— Behavior Prediction?
— Outlier Detection?
— Simulation and HPC?
— Massively wide data?
— Real-Time scoring?
— Internet of Things?
Eight Steps to Fast, Scalable R Analytics with
Hadoop
Open Source Options
1. Open Source R
2. Revolution R Open
3. Open Source Parallelization…
4. rHadoop…
Commercial Options
5. RRE on Servers &
Workstations
6. RRE on Edge Nodes
7. RRE Inside Hadoop
8. RRE on Edge Node & Inside
Hadoop
No Clear Winner:
 Budget & use case determine
optimal path
 Compelling options in both open
source & commercial source
 RRE ScaleR uniquely provides
automatic parallelization
 Current Hadoop platforms are
fast for large scale analytics.
 Combined in-server & in-hadoop
fits majority of cases
2015 Challenges & Opportunities
• Evolving Hadoop Architectures
• In-Memory Analytics – Spark, YARN Containers, Caching
• Additional Algorithm Parallelization
• Cluster Management
• Cloud and Hybrid Cloud Clusters
• SQL on Hadoop “Battle-Royale”
• Addressing the Resource Reality
• Integration, Deployment Both Drain on Expensive Resources
• Leverage other skills
• Design efficient collaboration
• “Analytics for the Rest of Us”
• New Consumption Targets – Mobile
• New Participants in Design – Business Users
Recommended Resources
 Revolution Analytics Products
– http://www.revolutionanalytics.com/products
– http://www.revolutionanalytics.com/big-analytics-hadoop-and-edws
 Whitepaper: “Delivering Value from Big Data with Revolution R
Enterprise and Hadoop
– http://www.revolutionanalytics.com/whitepaper/delivering-value-big-data-
revolution-r-enterprise-and-hadoop
 Revolution Analytics on Social Media:
– http://blog.revolutionanalytics.com/
– @revolutionr on Twitter
– @bill_jacobs on Twitter
Thank you.
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR

More Related Content

What's hot

High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationRevolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14Revolution Analytics
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for HadoopWilly Marroquin (WillyDevNET)
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobsBill Jacobs
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesRevolution Analytics
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarRevolution Analytics
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)Revolution Analytics
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Revolution Analytics
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...MapR Technologies
 

What's hot (20)

High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
 
Revolution R - 100% R and More
Revolution R - 100% R and MoreRevolution R - 100% R and More
Revolution R - 100% R and More
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two Strategies
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User Webinar
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 

Viewers also liked

Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsLou Bajuk
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
R server and spark
R server and sparkR server and spark
R server and sparkBAINIDA
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
R hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveR hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveAiden Seonghak Hong
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R OpenRevolution Analytics
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceRevolution Analytics
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in Rarmstrtw
 

Viewers also liked (20)

Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time Applications
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
R server and spark
R server and sparkR server and spark
R server and spark
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
RHive tutorial - HDFS functions
RHive tutorial - HDFS functionsRHive tutorial - HDFS functions
RHive tutorial - HDFS functions
 
R hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveR hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhive
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
RHive tutorial - Installation
RHive tutorial - InstallationRHive tutorial - Installation
RHive tutorial - Installation
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in R
 

Similar to Performance and Scale Options for R with Hadoop: A comparison of potential architectures

Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2Revolution Analytics
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxRevolution Analytics
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R ServicesGregg Barrett
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar Revolution Analytics
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 

Similar to Performance and Scale Options for R with Hadoop: A comparison of potential architectures (20)

Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
 

More from Revolution Analytics (18)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Performance and Scale Options for R with Hadoop: A comparison of potential architectures

  • 1. R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs
  • 2. Polling Question #1:  Who Are You? (choose one) – Statistician or modeler who uses R – Other R developer – Hadoop Expert – Application builder – Data guru – Business user – Systems vendor or reseller – Something else…
  • 3. • Challenges • Options • Considerations • How to Choose Agenda
  • 4. Boundless Opportunities  Marketing: Clickstream & Campaign Analyses  Digital Media: Recommendation Engines  Retail: Social Sentiment Analysis  Insurance: Fraud Waste and Abuse  Healthcare Delivery: Outcome Prediction  Manufacturing: Quality Optimization  P&C Insurance: Risk Analysis  Consumer Products: Warranty Optimization  Operations: Supply Chain Optimization  Econometrics: Market Prediction  Marketing: Mix and Price Optimization  Life Sciences: Pharmacogenetics  Transportation: Asset Utilization
  • 5. Polling Question #2:  What Industry Do You Represent? – Financial Services – Insurance – Healthcare, Life Sciences or Pharma – Manufacturing – Energy – Retail – Logistics and Transportation – Education – Government – Marketing & Advertising – Technology – Other
  • 6. In A Perfect World… Analytical Capability Compute Data Scale UsersPrice Ease Security
  • 7. Hadoop Analytics - Many Alternatives  R Based Alternatives  Legacy tools updated – SAS HPA, etc.  Big Data Databases  Other Languages – Scala, Java, Julia, various GUIs Today’s Topic:  R-Based Alternatives – “Beside Architectures” – “Inside Architectures” – Open Source and Commercial
  • 8. Reality: Tradeoffs. Memory Limits In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs. Automatic Distribution Locality vs. Movement Real-Time vs. MapReduce Traditional Statistics vs. Machine Learning
  • 10. Corporate Overview & Quick Facts Founded 2008 (as REvolution Computing) Office Locations Palo Alto (HQ), Seattle (Engineering) Singapore London CEO David Rich Number of customers 200+ Investors • Northbridge Venture Partners • Intel Capital • Platform Vendor Web site: • www.revolutionanalytics.com Revolution R Enterprise is the leading commercial analytics platform based on the open source R statistical computing language
  • 11. Revolution Analytics Our Vision: R becomes the de- facto standard for enterprise predictive analytics Our Mission: Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
  • 12. Revolution Analytics Builds & Delivers:  Software Products:  Stable Distributions  Broad Platform Support  Big Data Analytics in R  Application Integration  Deployment Platforms  Agile Development Tooling  Future Platform Support  Support & Services  Commercial Support Programs  Training Programs  Professional Services  Community Programs  Academic Support Programs  Contributions to Open Source R  Open Source Extensions  Sponsorship of R User Groups
  • 13. Revolution Analytics Technical Innovations  R Options from Open Source to Enterprise  Parallelized Analytical Computation  In-Database & In-Hadoop Analytics  Big Data Scalability  Remote Execution  Production Deployment Support  Multi-Platform Deployment  Legacy Data Format Support  Multiple IDE Options  PMML Model Export
  • 14. The Revolution R Product Suite • Free and open source R distribution • Enhanced and distributed by Revolution Analytics Revolution R Open • Open-source distribution of R, packages, and other components • Enhanced, supported and indemnified by Revolution Analytics Revolution R Plus • Secure, Scalable and Supported Distribution of R • With proprietary components created by Revolution Analytics Revolution R Enterprise
  • 15. Polling Question #3:  State Play: In your company you are… – Building Our “Data Lake” – Running R + Hadoop Data Today – Running R inside Hadoop using Open source – Running RRE inside Hadoop – Deploying Business Apps. Using Analytics from Hadoop Data – Looking at Next Steps e.g. Spark, etc.
  • 16. Revolution Analytics: Eight Alternatives for Integrating R & Hadoop Open Source 1. Open Source R 2. Revolution R Open 3. Open Source Parallelization on Workstations & Servers 4. rHadoop: Open Source Parallelization with rHadoop Commercial 5. Revolution R Enterprise on Servers & Workstations 6. Revolution R Enterprise on Edge Nodes 7. Revolution R Enterprise Inside Hadoop 8. Combined Edge Node & Inside Hadoop
  • 17. 1. Open Source R Integrated With Hadoop • Traditional Open Source • Memory- Limited • Data Moves Traditional Open Source R “Beside” Architecture: CRAN Algorithms rHDFS rHbas e rHive rODB C
  • 18. 2. Revolution R Open On Workstations & Servers Replace Open Source R “Beside” Architecture with Revolution R Open As with Open Source R: • Still Free. • Still Memory Based. • Data Still Moves. Improvements: • Accelerates Math with Intel MKL • Improves R-based packages Limitations • No Effect for non-R Code CRAN Algorithms rHDFS rHbas e rHive rODB C
  • 19. Accelerate R Math with Intel Math Kernel Lib’s. Source: http://blog.revolutionanalytics.com/2014/10/revolution-r-open-mkl.html
  • 20. 3. Write Parallel Algorithms PC, Server or Clusters Write R Code to Explicitly Parallelize – Deploy Across Several Systems Can Include CRAN Algorithms “Carefully” ForEach & Iterator • DoParallel (PC, server) • DoMPI (cluster) • RRE RxEXEC Example Uses: • Bootstrapping • Simulation • HPC rHDFS rHbas e rHive rODB C As with Previous: • Still Free. • Still Memory Based. • Data Still Moves. • Intel MKL with RRO Improvements: • Parallelized Execution Limitations: • Parallelization Difficulty • Data Movement • Platform Specific
  • 21. 4. rHadoop: Custom Parallel Execution for Hadoop Remote Desktop R Code Execute R Code & CRAN Algorithms Inside Hadoop Example Uses: • Scoring • Transformation • Easily Parallelized Algorithms Hadoop Streaming Can Include CRAN Algorithms As With Previous:  Still Free.  Optional Intel MKL in RRO Improvements:  Runs R in MapReduce  No Data Movement Limitations:  Manual Parallelization  Hadoop Specific rHbase rHDFS rMapReduce
  • 22. 5. Revolution R Enterprise (RRE) PEMAs inside Hadoop Traditional “Beside” Architecture with Optimized Algorithms Available for Windows, Linux As With Previous:  Includes Intel MKL in RRO Advantages  Speed: PEMAs Parallelize Across Threads, Cores & Sockets  Scale: PEMAs “Chunk” - no Memory Limits  All of CRAN Available  Portability  Fully Supported Limitations:  Data Movement  Single Machine Revolution R Enterprise: • ScaleR PEMA Algorithms plus • All of CRAN (subject to memory limits) rHDFS rHbas e rHive rODB C
  • 23. Revolution R Enterprise  High Performance, Scalable Analytics  Portable Across Enterprise Platforms  Easier to Build & Deploy Analytics is…. the only big data big analytics platform based on open source R
  • 24. ScaleR Refactor Algorithms for Dramatic Performance and Capacity Improvement
  • 25. ScaleR High Performance Algorithms for the Most Common Uses  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort, Merge, Split  Aggregate by category (means, sums)  Min / Max, Mean, Median (approx.)  Quantiles (approx.)  Standard Deviation  Variance  Correlation  Covariance  Sum of Squares (cross product matrix for set variables)  Pairwise Cross tabs  Risk Ratio & Odds Ratio  Cross-Tabulation of Data (standard tables & long form)  Marginal Summaries of Cross Tabulations  Chi Square Test  Kendall Rank Correlation  Fisher’s Exact Test  Student’s t-Test  Subsample (observations & variables)  Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.  Covariance & Correlation Matrices  Logistic Regression  Classification & Regression Trees  Predictions/scoring for models  Residuals for all models Predictive Models  K-Means  Decision Trees  Decision Forests  Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection  Stepwise Regression  Simulation (e.g. Monte Carlo)  Parallel Random Number Generation Combination 25Revolution Analytics Confidential – Under NDA New in 7.3  PEMA-R API  rxDataStep  rxExec
  • 26. ScaleR PEMA What’s a PEMA? Parallel External Memory Algorithms Master Algorithm Process Data Analyze Each Block • Not Limited to Available Memory • Unlimited Data Scale • Ingests Data One Chunk At A Time. • Adjustable Memory Footprint • Multi-Thread Execution Performance • Highly-Optimized Algorithms • Algorithm Math Fully Refactored for Parallelism • Delivered as ScaleR Library in Revolution R Enterprise Load Block At A Time Combine Individual Results Script Calls ScaleR Algorithm Scripts can call CRAN Open Source Algorithms Start & Manage Processing
  • 27. rHDFS rHbas e rHive rODB C 6. Run Revolution R Enterprise on Hadoop Edge Node(s) Local File System (opt.) ScaleR + CRAN Algorithms Fast Single-Server Alternative for Modest Data Scale Edge NodeThin Client or Remote Desktop As With Previous:  Single Machine Execution  PEMA Scale & Speed (Single Machine)  Use ScaleR + CRAN  Accelerate R with Intel MKL Improvements:  Easily Shared via  No Data Movement  Develop on Desktop Run on Edge Node Limitations:  “Shorter Trip” for Data
  • 28. 7. Fast, Transparent Parallel Computation Inside Hadoop YARN/MapReduce jobtracker ScaleR Algorithms DeployR Fast Parallelized Analytics on Large Data Sets In Hadoop As With Previous:  Speed and Scale of ScaleR PEMA Algorithms  Use CRAN Where Appropriate  Accelerate R Math with MKL  Custom Parallelized Algo’s Advantages  Parallel Computation  No Data Movement  ScaleR PEMA Parallelization  Can Parallelize CRAN “Carefully”  Portable Coding Limitations:  Hadoop Workload Profiles We b Ser vice s Web Services Remote Execution Desktop & Server Tools and Applications
  • 29. 29 One Client’s Experience with RRE on Hadoop Test Cluster - 9 Nodes Task Processing Time Importing and Filtering Datasets from HDFS 14 Million Observations 82 sec. 227 Million Observations 310 sec. Modeling and Estimation 1.2 M Correlations 2771 sec. Simple Linear Regression, 227 M Observations 61 sec. Multiple Linear Regression, Three Variables, 227 M Observations 58 sec. Multiple Linear Regression, Four Variables, 227 M Observations 58 sec. Random Forest, 10 Predictor Variables, 227 M Observations, 10 Trees with Max Depth of 10 Splits 2 hr. 3 min. 64GB 24 cores each 9 Task Nodes 2 Admin Nodes1 Edge Node 128GB 24 cores each 128GB 24 cores each
  • 30. 8. Combined Edge Node & In-Hadoop ScaleR Algorithms DeployR Maximized Flexibility, Performance & Workload Handling As With Previous:  Speed and Scale of ScaleR PEMA Algorithms  Use CRAN Where Appropriate  Accelerate R Math with MKL  Custom Parallelized Algo’s Advantages  Flexibility for Blended Workloads  Little or No Data Movement  Maximize CRAN Capabilities by Sharing Large RAM Edge Nodes We b Ser vice s Thin Client Development Remote Execution Desktop & Server Tools and Applications rStudio
  • 31. Occasionally Conflicting Criteria Infrastructure Criteria:  Big Data Platform  Vendor Choice  Data Ingest  Data Security  Data Governance Data Science Criteria:  Performance  Self Service  Flexibility  Collaboration  Sharing  Capability
  • 32. Key Questions:  Where are the bulk of your skills? SAS? R? Java? Python? SQL?  Where do you build models today?  Do you have the skills to parallelize algorithms?  Can models be built on a big shared server?  How will you run models?  Do you have the budget to purchase commercial solutions?  How will your needs change over time?  What is your future architecture plan?  How risk averse is your management team regarding new platforms and open source?
  • 33. Key Questions (cont.)  What Workloads Do You Anticipate? — How May Users? — What Workloads?  Workload Realities: — Many small tasks do not run well in MapReduce — Large data movements / duplications are costly  What Use Cases Will You Encounter? — Traditional statistical exploration, modeling? — Behavior Prediction? — Outlier Detection? — Simulation and HPC? — Massively wide data? — Real-Time scoring? — Internet of Things?
  • 34. Eight Steps to Fast, Scalable R Analytics with Hadoop Open Source Options 1. Open Source R 2. Revolution R Open 3. Open Source Parallelization… 4. rHadoop… Commercial Options 5. RRE on Servers & Workstations 6. RRE on Edge Nodes 7. RRE Inside Hadoop 8. RRE on Edge Node & Inside Hadoop No Clear Winner:  Budget & use case determine optimal path  Compelling options in both open source & commercial source  RRE ScaleR uniquely provides automatic parallelization  Current Hadoop platforms are fast for large scale analytics.  Combined in-server & in-hadoop fits majority of cases
  • 35. 2015 Challenges & Opportunities • Evolving Hadoop Architectures • In-Memory Analytics – Spark, YARN Containers, Caching • Additional Algorithm Parallelization • Cluster Management • Cloud and Hybrid Cloud Clusters • SQL on Hadoop “Battle-Royale” • Addressing the Resource Reality • Integration, Deployment Both Drain on Expensive Resources • Leverage other skills • Design efficient collaboration • “Analytics for the Rest of Us” • New Consumption Targets – Mobile • New Participants in Design – Business Users
  • 36.
  • 37. Recommended Resources  Revolution Analytics Products – http://www.revolutionanalytics.com/products – http://www.revolutionanalytics.com/big-analytics-hadoop-and-edws  Whitepaper: “Delivering Value from Big Data with Revolution R Enterprise and Hadoop – http://www.revolutionanalytics.com/whitepaper/delivering-value-big-data- revolution-r-enterprise-and-hadoop  Revolution Analytics on Social Media: – http://blog.revolutionanalytics.com/ – @revolutionr on Twitter – @bill_jacobs on Twitter