4. Revolution Confidential
Agenda
ď§ Quick introduction to Revolution Analytics
ď§ Where does SAS and R fit in the Analytical
Landscape
ď§ Introduction to R
ď§ Typical Challenges Facing Analytical Organisations
ď§ Differences between SAS and Revolution R
ď§ Big Data
ď§ Complex Computation
ď§ Enterprise Readiness
ď§ Production Efficiency
ď§ Access to Talent
ď§ ConclusionsâŚ
4
5. Revolution Confidential
Corporate Overview & Quick Facts
Founded 2008 (as
REvolution
Computing)
Office Locations Palo Alto (HQ),
Seattle
(Engineering)
Singapore
London
CEO David Rich
Number of
customers
200+
Investors ⢠Northbridge Venture Partners
⢠Intel Capital
⢠Platform Vendor
Web site: ⢠www.revolutionanalytics.com
Revolution â âContenderâ
The Forrester Waveâ˘: Big Data
Predictive Analytics Solutions, Q1
2013
5
In the big data analytics
context, speed and scale
are critical drivers of
success, and Revolution
R delivers on both
Revolution R Enterprise is the leading commercial analytics platform based on
the open source R statistical computing language
7. Revolution Confidential
Where does R fit in the analytical lifecycle
7
Analytical
data
Preparation
Analytical
data
Exploration
Model
Devlopment
Model
Deployment
ETL
BI /
opera
tions
Opensource R competencies
Open source R is not
- ETL
- Business reporting tool
- An end to end solution such as SAS
Marketing Automation or SAS Fraud
Framework
8. Revolution Confidential
Is:
ď§ The way to do statistical computing
ď§ A full blown programming language
ď§ The home of every data mining algorithm known to
data science.
ď§ A vibrant world-wide community
8
R was written in early
1990âs by
Robert
Gentleman
Ross Ihaka
the evolution of the
Since 1997 a core
group of ~ 20
developers guides
the evolution of the
language
9. Revolution Confidential
Top companies are using R around the world
ď§ The NHS uses R to advance patient care and diagnosis
ď§ The New York Times routinely uses R for interactive and print data
visualization.
ď§ Ogilvy Europe uses R to analyse digital media campaigns for major
brands
ď§ Google has more than 500 R users.
ď§ The FDA supports the use of R for clinical trials of new drugs.
ď§ The National Weather Service uses R to predict the extent of events.
ď§ Facebook uses R to model user behaviour.
ď§ The Consumer Financial Protection Bureau uses R and other open
source tools.
ď§ Twitter uses R for data science applications on the Twitter database.
ď§ John Deere uses R to forecast crop yields and optimize tractor
manufacturing.
9
Companies are recognising the additional benefits of R
10. Revolution Confidential
Incredible Graphics and Data Visualization lead the way
vs SAS
ď§ Functions for standard
graphs
ď§ Scatterplot, time series,
histogram, smoothing, âŚ
ď§ Bar plot, pie chart, dot chart,
âŚ
ď§ Image plot, 3-D surface, map,
âŚ
ď§ Customize without limits
ď§ Combine graph types
ď§ Create entirely new graphics
10
11. Revolution Confidential
R is open source and drives analytic innovation but has
some limitations for Enterprises
BiggerÂ
data sizesÂ
Speed ofÂ
analysisÂ
ProductionÂ
support
Memory Bound Big Data
Single Threaded Scale out, parallel
processing, high speed
Community Support Commercial
production support
InnovationÂ
and scale
Innovative â 4500
packages+,
exponential growth
Combines with open
source R packages
where needed
11
12. Revolution Confidential
Typical Challenges
Facing Analytical Organisations
12
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
13. Revolution Confidential
Lets talk BIG DATA
13
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
14. Revolution Confidential
How do SAS and Revolution R stack up for Big
Data
ď§ Both handle large data sets well (big speed differencesâŚ.)
ď§ Both have high speed database connectors to handle variety
/ velocity
ď§ Object Orientated nature of R handles data manipulation and
visualisation in a superior way
ď§ Data Step parallel functions (such as merge/sort/cleansing)
in Revolution R are available only in SAS HPA environments
ď§ RHadoop project (rhbase, rhdfs, rmr) run in-side Hadoop
14
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
15. Revolution Confidential
Lets talk Complex Computation
15
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
16. Revolution Confidential
How do SAS and Revolution R stack up for
Complex Computation
ď§ Innovative Models: More functions available in R
16
Complex
Computation
⢠Innovative
models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
0 1,000 2,000 3,000 4,000 5,000
1,192
4,500
R SAS
R 2.15.2 Packages
SAS 9.3 statements, procedures,
functions and call routines
Source: http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/
17. Revolution Confidential
How do SAS and Revolution R stack up for
Complex Computation
ď§ Revolution R runs in parallel across multiple nodes
and cores
ď§ SAS runs in parallel in SAS Grid multiple jobs, but still
single threaded
ď§ SAS can run in parallel in SAS HPA
17
Complex
Computation
at Speed
⢠Innovative
Models
⢠Experimentation
⢠Precision
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
18. Revolution Confidential
Lets talk Enterprise Readiness
18
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
19. Revolution Confidential
How do SAS and Revolution R stack up for
Enterprise Readiness
ď§ Both handle heterogeneous landscapes
ď§ SAS runs on anything but mostly single threaded apart
from Teradata and Greenplum (no cloud except through
own managed services)
ď§ Revolution runs across windows/Linux clusters, cores,
Hadoop, Amazon Web Services, Microsoft Azure,
Netezza and Teradata
ď§ SAS Programmers must write code for the required
environment, whilst Revolution R code is device independent
ď§ Both offer good production support
ď§ SAS integrates with pretty much all common BI reporting
tools as does Revolution
19
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
20. Revolution Confidential
Lets talk Production Efficiency
20
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
21. Revolution ConfidentialHow do SAS and Revolution R stack
up for Speed & Production Efficiency?
21
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-End
Cycle Time
⢠Pace of Decision
Accelerated
*As published by SAS in HPC Wire, April 21, 2011
http://www.hpcwire.com/hpcwire/2011-04-19/sas_brings_high_performance_analytics_to_database_appliances.html
22. Revolution Confidential
Options for handling Speed
22
SAS
- Normal SAS
- Single Threaded
SAS Grid
- Platform LSF
- Single Threaded
SAS In-Database Scoring
- Teradata Accelerator
- Greenplum Accelerator
SAS High Performance Computing
- Visual Analytics
- HPA on Teradata / Greenplum
Revolution R
- DistributedR parallel compute
contexts, windows, Linux,
Amazon Azure, Hadoop, Netezza
âŚbut Multi-threaded
âŚAll databases that
support PMML
âŚCommodity
hardware, Hadoop,
Netezza, (Teradata
October)
24. Revolution Confidential
Lets talk Talent
24
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Big Data
⢠New Data
Sources
⢠Data Variety &
Velocity
⢠Fine Grain
Control
⢠Data Movement,
Memory Limits
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Complex
Computation
⢠Innovative
Models
⢠Experimentation
⢠Many Small
Models
⢠Ensemble
Models
⢠Simulation
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Enterprise
Readiness
⢠Heterogeneous
Landscape
⢠Write Once,
Deploy
Anywhere
⢠Production
Support
⢠How to put
analytics in the
hands of
business users
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Speed &
Production
Efficiency
⢠Shorter Model
Shelf Life
⢠Volume of
Models
⢠Long End-to-
End Cycle Time
⢠Pace of Decision
Accelerated
⢠Hardware
Required
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
Talent
⢠Finding data
scientists
⢠Ensuring work-
force is
continually
trained
⢠Creating an
Analytical
culture
25. Revolution Confidential
Talent gap emerging
ď§ Will finding SAS talent become more difficult?
ď§ Programming community want to keep up to date and work on modern
object orientated languages
ď§ Many universities have adopted R as the defacto analytics standard for
statistics
ď§ Since 2012, USA job descriptions that included âSASâ declined by 7.3%
whilst Jobs for âRâ increased by 42% (number of jobs on indeed.com)
25
Search phrase: âStatistics Programmingâ
Sorted by popularity (May 29, 2013)
7 out of 10 books based on R
0 out of 10 books based on SAS or SPSS
26. Revolution Confidentialwww.revolutionanalytics.com - Page Views
26
0
20000
40000
60000
80000
100000
120000
140000
160000
151302
36724
28321
27718
19888
12990
13615
11096
11748
10442
Page Views - Top 10 Countries
01/04/2013 â 25/05/2013
197454
163055
112172
19303
6544
4073
738 10624795
Page Views by Geo â 01/04/2013 â
25/05/2013
EUROPE
NORTH AMERICA
APJ
SOUTH AMERICA
AFRICA
MIDDLE EAST
NA
CARIBBEAN
CENTRAL AMERICA
15645
76227
EMEA Page Views by Organisation Type
Academic
Commercial
27. Revolution Confidential
Functionality SAS Software Revolution R
Foundation
Statistics
Graphics
Matrix Operations
Optimization
Time Series
Quality Control
Database Access
Deploy in Excel
Deploy in BI
Distributed Algorithms
Parallel small compute
In Database Scoring
27
Base SAS
SAS/STAT
SAS/Graph
SAS IML
SAS/OR
SAS ETS
SAS QC
SAS/ACCESS
SAS Business
Intelligence
SAS HPA Server
SAS Grid
SAS DB Accelerators
How do the modules breakdown
29. Revolution Confidential
Conclusions
ď§ Complement SAS whenâŚ
ď§ End to end industry based solutions from SAS are a
good fit for a particular business problem (e.g. SAS
Fraud Framework for Insurance, Marketing
Automation for Retail )
ď§ Complement when innovative models needed,
visualisation or big data/complex model support is required
ď§ Choose SAS when users are not coders and need a
point and click interface (SAS enterprise guide, SAS
enterprise miner)
ď§ Existing SAS landscape requires significant re-
training
29
30. Revolution Confidential
Conclusions
ď§ Replace SAS whenâŚ
ď§ Cost savings, do things faster, deal with bigger
data
ď§ Big data and complex processing is required
ď§ Innovative models that give a competitive
advantage
ď§ Access to talent today and in the future
ď§ Flexible compute environments are required
30