Revolution Confidential
Are You Ready for Big
Data Big Analytics?
September, 2013
Bill Jacobs
Director, Product Marketing
Revolution Analytics
@bill_jacobs
Revolution Analytics
@RevolutionR
Revolution Confidential
2
Revolution Confidential
3
Key Big Data Challenge: The Analytics
Talent Pool
Revolution Confidential
4
The Analytics Talent Pool with R
2 Million R Users
Revolution Confidential
What Language is Most Popular for Data
Mining and Data Science?
Survey Question:
“What programming/statistics languages you used for an analytics /
data mining / data science work in 2013?”
Results:
R – 61%
Python – 39%
SQL - 37%
How does this compare to 2012?
“Highest growth was for Pig/Hive/Hadoop-based languages, R, and
SQL, while Perl, C/C++, and Unix tools declined…”
From 2013 KDNuggets Survey of 700 voters.
5
Revolution Confidential
The R Language: What Is It?
 A Language Platform…
 A Procedural Language optimized for Statistics and Data Science
 A Data Visualization Framework
 Provided as Open Source
 A Community…
 2M Statistical Analysis and Machine Learning Users
 Taught in Most University Statistics Programs
 Active User Groups Across the World
 An Ecosystem
 CRAN: 4500+ Freely Available Algorithms, Test Data and
Evaluations
 Many Applicable to Big Data If Scaled
6
Revolution Confidential
Revolution Analytics - Overview
7
We are the only provider of a commercial analytics platform based on
the open source R statistical computing language.
Power
Productivity
Enterprise
Readiness
Stable,scalable
multi-platform
world-wide support
Easier to build and deploy analytic
applications
Professional services enablement
Distributed, high performance
analytics algorithms
World Wide Support Teams
• Standard and Premium Programs
• Technical Account Managers
• Customer Success Managers
Professional Services
• Architecture planning
• Systems Integration
• Advanced analytic applications
• Full life cycle projects
Revolution Confidential
Digital Media & Retail
200+ Customer Stories
Finance & Insurance Healthcare & Life Sciences
Manufacturing & High TechAcademic & Gov’t
8
Revolution Confidential
Revolution R Enterprise
9
Revolution R Enterprise
is the only commercial big data analytics platform
that provides Big Data Big Analytics based on R.
Portable Across Enterprise Platforms
High Performance, Scalable Analytics
Easier to Build & Deploy
Revolution Confidential
Aditional Technology Challenges
Accompanying Big Data Analytics Efforts
10
Big Data
• New Data
Sources
• Data Variety &
Velocity
• Fine Grain
Control
• Data Movement,
Memory Limits
Complex
Computation
• Experimentation
• Many Small
Models
• Ensemble
Models
• Simulation
Enterprise
Readiness
• Heterogeneous
Landscape
• Write Once,
Deploy Anywhere
• Skill Shortage
• Production
Support
Production
Efficiency
• Shorter Model
Shelf Life
• Volume of
Models
• Long End-to-End
Cycle Time
• Pace of Decision
Accelerated
Revolution Confidential
Open Source R Drives Analytical Innovation
… with some limitations for enterprises
but has some limitations for Enterprise Deployment
Memory Bound
Large Data & Cluster-Based
Storage Management
Single Threaded
Scalable, multi-threaded,
parallel processing
Community Support
Commercial production
support and professional
services teams
Innovative – 5000
packages+,
exponential growth
Ability to combine
with open source R
packages where
needed
Operate on
bigger data
sizes
Increased
speed of
analysis
Holistic
production
support
A key combination
of innovation and
scale
Results
limitations
Revolution Confidential
Big Data Speed @ Scale with
Revolution R Enterprise (RRE)
Fast Math Libraries
Parallelized Algorithms
In-Database Execution
Multi-Threaded Execution
Multi-Core Processing
In-Hadoop Execution
Memory Management
Parallelized User Code
12
First, we enhance and
accelerate the Open
Source R interpreter.
Revolution Confidential
Open Source R performance:
Multi-threaded Math
Open
Source R
13
Revolution R
Enterprise
Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 176 sec 9.3 sec 18x
Cholesky Factorization 25.5 sec 1.3 sec 19x
Linear Discriminant Analysis 189 sec 74 sec 3x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
Customers report 5-50x
performance improvements
compared to Open Source R —
without changing any code
Revolution Confidential
Big Data Speed @ Scale with
Revolution R Enterprise (RRE)
Fast Math Libraries
Parallelized Algorithms
In-Database Execution
Multi-Threaded Execution
Multi-Core Processing
In-Hadoop Execution
Memory Management
Parallelized User Code
14
Second, we built a
platform for hosting R
with Big Data on a
variety of massively
parallel platforms.
Revolution Confidential
Unparalleled Big Data Big Analytics
Scale, Performance & Innovation
15
1 + 1 = 1000’s
Performance
V
a
l
u
e
Revolution R Enterprise
+ =
Performance
Enhanced R
R Language
Open Source
R Analytic
Packages
Big Data
Distributed &
Parallel
Processing
&
Analytic Package
Big Data
Distributed &
Parallel
Processing
&
Analytic Package
Open Source
R Analytic
Packages
Performance Enhanced R
Revolution Confidential
Analytic Personas and their Tools
16
Analytic
Consumer
Business
Analyst
Power
Analyst
Data
Scientist
Information
Technologist
Right Tool, Right Problem
Revolution Confidential
On-demand sales
forecasting
Real-time social
media sentiment
analysis
Create Custom, On-Demand Analytical Apps
Some Examples:
Leveraging the
power of R from
Microsoft tools
17
Revolution Confidential
18
Revolution Confidential
Predicting Predictive Analytics
 What Are Your Use Cases?
 How Will Your Use Cases Evolve?
 What Platform Will Best Support Each?
 Who’s Platform Excel Tomorrow?
19
?
Revolution Confidential
Portability and Investment Assurance:
Write Once – Deploy Anywhere
20
Servers
Server Clusters
EDWs and Analytical DBMSs
Hadoop (coming soon!)
Write it Once.
Deploy it Anywhere
Workstations
Revolution Confidential
Summary.
 R is Hot.
 Revolution R Enterprise:
 Scales R to Big Data.
 Scales Performance on Big Data Platforms
 Is Commercially Supported
 Is Broadly Deployable
 Allows you to WODA!
 Revolution Analytics Maximizes Results, While
Minimizing Near-Term and Long-Term Risks
21
Revolution Confidential
22
www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR
The leading commercial provider of software and support for the popular
open source R statistics language.
Next steps?
Revolution Confidential
23
Thank You.

Revolution Analytics Podcast

  • 1.
    Revolution Confidential Are YouReady for Big Data Big Analytics? September, 2013 Bill Jacobs Director, Product Marketing Revolution Analytics @bill_jacobs Revolution Analytics @RevolutionR
  • 2.
  • 3.
    Revolution Confidential 3 Key BigData Challenge: The Analytics Talent Pool
  • 4.
    Revolution Confidential 4 The AnalyticsTalent Pool with R 2 Million R Users
  • 5.
    Revolution Confidential What Languageis Most Popular for Data Mining and Data Science? Survey Question: “What programming/statistics languages you used for an analytics / data mining / data science work in 2013?” Results: R – 61% Python – 39% SQL - 37% How does this compare to 2012? “Highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined…” From 2013 KDNuggets Survey of 700 voters. 5
  • 6.
    Revolution Confidential The RLanguage: What Is It?  A Language Platform…  A Procedural Language optimized for Statistics and Data Science  A Data Visualization Framework  Provided as Open Source  A Community…  2M Statistical Analysis and Machine Learning Users  Taught in Most University Statistics Programs  Active User Groups Across the World  An Ecosystem  CRAN: 4500+ Freely Available Algorithms, Test Data and Evaluations  Many Applicable to Big Data If Scaled 6
  • 7.
    Revolution Confidential Revolution Analytics- Overview 7 We are the only provider of a commercial analytics platform based on the open source R statistical computing language. Power Productivity Enterprise Readiness Stable,scalable multi-platform world-wide support Easier to build and deploy analytic applications Professional services enablement Distributed, high performance analytics algorithms World Wide Support Teams • Standard and Premium Programs • Technical Account Managers • Customer Success Managers Professional Services • Architecture planning • Systems Integration • Advanced analytic applications • Full life cycle projects
  • 8.
    Revolution Confidential Digital Media& Retail 200+ Customer Stories Finance & Insurance Healthcare & Life Sciences Manufacturing & High TechAcademic & Gov’t 8
  • 9.
    Revolution Confidential Revolution REnterprise 9 Revolution R Enterprise is the only commercial big data analytics platform that provides Big Data Big Analytics based on R. Portable Across Enterprise Platforms High Performance, Scalable Analytics Easier to Build & Deploy
  • 10.
    Revolution Confidential Aditional TechnologyChallenges Accompanying Big Data Analytics Efforts 10 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Skill Shortage • Production Support Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to-End Cycle Time • Pace of Decision Accelerated
  • 11.
    Revolution Confidential Open SourceR Drives Analytical Innovation … with some limitations for enterprises but has some limitations for Enterprise Deployment Memory Bound Large Data & Cluster-Based Storage Management Single Threaded Scalable, multi-threaded, parallel processing Community Support Commercial production support and professional services teams Innovative – 5000 packages+, exponential growth Ability to combine with open source R packages where needed Operate on bigger data sizes Increased speed of analysis Holistic production support A key combination of innovation and scale Results limitations
  • 12.
    Revolution Confidential Big DataSpeed @ Scale with Revolution R Enterprise (RRE) Fast Math Libraries Parallelized Algorithms In-Database Execution Multi-Threaded Execution Multi-Core Processing In-Hadoop Execution Memory Management Parallelized User Code 12 First, we enhance and accelerate the Open Source R interpreter.
  • 13.
    Revolution Confidential Open SourceR performance: Multi-threaded Math Open Source R 13 Revolution R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ Customers report 5-50x performance improvements compared to Open Source R — without changing any code
  • 14.
    Revolution Confidential Big DataSpeed @ Scale with Revolution R Enterprise (RRE) Fast Math Libraries Parallelized Algorithms In-Database Execution Multi-Threaded Execution Multi-Core Processing In-Hadoop Execution Memory Management Parallelized User Code 14 Second, we built a platform for hosting R with Big Data on a variety of massively parallel platforms.
  • 15.
    Revolution Confidential Unparalleled BigData Big Analytics Scale, Performance & Innovation 15 1 + 1 = 1000’s Performance V a l u e Revolution R Enterprise + = Performance Enhanced R R Language Open Source R Analytic Packages Big Data Distributed & Parallel Processing & Analytic Package Big Data Distributed & Parallel Processing & Analytic Package Open Source R Analytic Packages Performance Enhanced R
  • 16.
    Revolution Confidential Analytic Personasand their Tools 16 Analytic Consumer Business Analyst Power Analyst Data Scientist Information Technologist Right Tool, Right Problem
  • 17.
    Revolution Confidential On-demand sales forecasting Real-timesocial media sentiment analysis Create Custom, On-Demand Analytical Apps Some Examples: Leveraging the power of R from Microsoft tools 17
  • 18.
  • 19.
    Revolution Confidential Predicting PredictiveAnalytics  What Are Your Use Cases?  How Will Your Use Cases Evolve?  What Platform Will Best Support Each?  Who’s Platform Excel Tomorrow? 19 ?
  • 20.
    Revolution Confidential Portability andInvestment Assurance: Write Once – Deploy Anywhere 20 Servers Server Clusters EDWs and Analytical DBMSs Hadoop (coming soon!) Write it Once. Deploy it Anywhere Workstations
  • 21.
    Revolution Confidential Summary.  Ris Hot.  Revolution R Enterprise:  Scales R to Big Data.  Scales Performance on Big Data Platforms  Is Commercially Supported  Is Broadly Deployable  Allows you to WODA!  Revolution Analytics Maximizes Results, While Minimizing Near-Term and Long-Term Risks 21
  • 22.
    Revolution Confidential 22 www.revolutionanalytics.com 650.646.9545Twitter: @RevolutionR The leading commercial provider of software and support for the popular open source R statistics language. Next steps?
  • 23.

Editor's Notes

  • #7 Remember that CRAN is a new term to IT professionals, and anyone who hasn’t learned much about R. Spend some time on it. CRAN = Community R Archive Network – a single repository of R algorithms, test data, evaluations. Use by nearly all R programmers.
  • #8 Who is revolution
  • #9 To understand how a typical customer might use RRE, it’s important to understand who a typical customer might be.users comprised of statisticians, data scientists, IT and academics across a wide variety of fields and industriesAlso point out flexibility of R solution, cross industries, CRAN offers incredible capabilities.Same with scalability, some customers use it to do desk top analysis and the exact same program is used in production servers elsewhere with no change to coding
  • #12 Despite the growth, there are limitations with open source R, and these become more impactful as either the scale of the data grows or the number of users within an organizationRevo addresses these points to offer a more complete solutionCompare and contrast
  • #13 This slide presents a way to distinguish ourselves from the open source versions of R, particularly those “supported” by platform vendors who bundle it. Explain that with this slide we are illustrating orders of magnitude performance improvement overall.Key advances are:Multi-threading and Multi-Core execution which allows parallel processors in a server to work together.Memory management that enables algorithms to use a combination of memory and disk, alleviating a long-standing problem with R, that of being limited by amount of physical memory.Parallelization in all its forms, but most importantly, the PEMA algorithms in ScaleR that work across clusters of servers – both in Hadoop and in cluster operating systems, to fully parallelize key statistics algorithms.
  • #15 This slide presents a way to distinguish ourselves from the open source versions of R, particularly those “supported” by platform vendors who bundle it. Explain that with this slide we are illustrating orders of magnitude performance improvement overall.Key advances are:Multi-threading and Multi-Core execution which allows parallel processors in a server to work together.Memory management that enables algorithms to use a combination of memory and disk, alleviating a long-standing problem with R, that of being limited by amount of physical memory.Parallelization in all its forms, but most importantly, the PEMA algorithms in ScaleR that work across clusters of servers – both in Hadoop and in cluster operating systems, to fully parallelize key statistics algorithms.
  • #18 DeployR Examples at: http://50.57.191.94/revolution/docs/examples/User:testuserPassword: secret