Revolution R - 100% R and More

Revolution Confidential

R evolution R :
100% R and More

P res ented by:
David S mith
V P Marketing and C ommunity
R evolution A nalytic s


P oll Ques tion
Which stats package do you use
most?

F ebruary 22, 2011: Welc ome! Revolution Confidential

 Thanks for coming.
 Slides and replay available (soon) at:
 http://bit.ly/z9xUG9

David Smith
VP Marketing & Community, Revolution Analytics
Editor, Revolutions blog
http://blog.revolutionanalytics.com
Twitter: @revodavid

3

In today’s webc as t: Revolution Confidential

 About Revolution Analytics and R

 What Revolution R adds to R

 Resources for getting more from R

 Q&A

Introducing Revolution R 4

What is R ? Download the White PaperConfidential
R is Hot
Revolution

bit.ly/r-is-hot
 Data analysis software
 A programming language
 Development platform designed by and for statisticians
 An environment
 Huge library of algorithms for data access, data
manipulation, analysis and graphics
 An open-source software project
 Free, open, and active
 A community
 Thousands of contributors, 2 million users
 Resources and help in every domain

5

R is exploding in popularity and
func tionality Revolution Confidential

Scholarly Activity
Google Scholar hits (’05-’09 CAGR)

R 46% “I’ve been astonished by the rate at which
R has been adopted. Four years ago,
SAS -11%
everyone in my economics department [at
SPSS -27%
the University of Chicago] was using
Stata; now, as far as I can tell, R is the
S-Plus 0% standard tool, and students learn it first.”

Stata 10%

Deputy Editor for New Products at Forbes
Package Growth
Number of R packages listed on CRAN

“A key benefit of R is that it provides near-
instant availability of new and
experimental methods created by its user
base — without waiting for the
development/release cycle of commercial
software. SAS recognizes the value of R
to our customer base…”

Product Marketing Manager SAS Institute, Inc.
2002 2004 2006 2008 2010

Source: http://r4stats.com/popularity 6

“ R is the mos t powerful & flexible s tatis tic al
programming language in the world” 1

 Capabilities
 Sophisticated
statistical analyses
 Predictive analytics
 Data visualization
 Applications
 Real-time trading MSFT [2009-


Last 29.29

Finance 30

 Risk assessment 25

 Forecasting 20

 Bio-technology 15

 Drug development
 Social networks
 .. and more

1. Norman Nie, multiple interviews 7

From: The R Ecosystem
R Us er C ommunity bit.ly/R-ecosystem

8


P oll Ques tion
If you're not using R today, what
would you most like to use R for?

R evolution R E nterpris e is Revolution Confidential

10

R P roduc tivity E nvironment (Windows )
Script with type
ahead and code Solutions window
snippets for organizing
code and data

Sophisticated
debugging with
breakpoints , variable Objects
values etc. loaded in the
R
Environment
Packages Object
installed and details
loaded

http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm

11

Interac tive Debugging Revolution Confidential

 One-click to set a breakpoint in an R script
 Step in/out/over, inspect variables
 Eliminate the edit -> browser -> repair cycle

12

P erformanc e: Multi-threaded Math Revolution Confidential

Open Revolution R
Source R Enterprise

Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 327 sec 13.4 sec 23x
Cholesky Factorization 31.3 sec 1.8 sec 17x
Linear Discriminant Analysis 216 sec 74.6 sec 2x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable

1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/

13

T hree P aradigms for B ig Data Revolution Confidential

 Standard R engine is constrained by
capacity and performance

 Revolution R Enterprise offers three
methods for big data with R:
 Off-line: high-performance file-based analytics
 Off-line, parallel & distributed analytics
 On-line, in-database analytics
 Hadoop
 Netezza

14

R evolution R E nterpris e with R evoS c aleR
B ig Data S tatis tic s in R Revolution Confidential

www.revolutionanalytics.com/bigdata

Every US airline
departure and arrival,
1987-2008

File: AirlineData87to08.xdf
Rows: 123.5 million
Variables: 29
Size on disk: 13.2Gb

arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE)

15

R evoS c aleR : B ig Data algorithms Revolution Confidential

 Data processing (rxDataStep)
 Descriptive statistics (rxSummary)
 Tables and cubes (rxCube, rxCrossTabs)
 Correlations/covariances (rxCovCor, rxCor,
rxCov, rxSSCP)
 Linear regressions (rxLinMod)
 Logistic regressions (rxLogit)
 K means clustering (rxKmeans)
 Predictions (scoring) (rxPredict)
 Custom distributed computing (RxExec)

Revolution R Enterprise 16

R evoS c aleR – Dis tributed C omputing Revolution Confidential

Compute • Portions of the data source are
Data Node made available to each compute
Partition (RevoScaleR) node

• RevoScaleR on the master node
Compute assigns a task to each compute
Data Node node
Partition (RevoScaleR)
Master • Each compute node independently
Node processes its data, and returns its
Compute (RevoScaleR) intermediate results back to the
Data Node master node
• master node aggregates all of the
intermediate results from each
Compute compute node and produces the
Data Node final result

*Available now for Microsoft HPC Server
Video demo: http://bit.ly/ugQ9KR
17

P latform-agnos tic B ig Data A nalytic s Revolution Confidential

 Set “compute context” to define hardware (one line of code)
 Native job-scheduler handles distribution, monitoring, failover etc.
 Same code runs on other supported architectures
 Just change compute context
 Supported architectures:
 Windows: Microsoft HPC Server
 Linux: Platform Computing LSF (coming 2012)

42 seconds instead of 6 minutes

18

A c ommon analytic platform ac ros s big
data arc hitec tures Revolution Confidential

Hadoop File Based In-database

19

In-Databas e E xec ution with IB M Netezza Revolution Confidential

More info: http://bit.ly/R-Netezza

20

R and Hadoop Revolution Confidential

 Hadoop offers a scalable infrastructure for
processing massive amounts of data
 Storage – HDFS, HBASE
 Distributed Computing - MapReduce
 R is a statistical programming language for
developing advanced analytic applications
 Currently, writing analytics for Hadoop requires
a combination of Java, pig, Python, …
 The Rhadoop project makes it possible to
write Big Data algorithms for Hadoop using the
R language alone.

21

R evoC onnec tR for Hadoop Revolution Confidential

Write Map-Reduce analytics using
HBASE only R code with these R
packages:
HDFS
 rhdfs - R and HDFS
R
Thrift  rhbase - R and HBASE
Map or
Reduce
 rmr - R and MapReduce
Task rhbase
rhdfs
Node

Revolution R More information at:
Job Client bit.ly/r-hadoop
Tracker rmr

22

E nterpris e R eadines s :
R evolution R E nterpris e S erver Revolution Confidential

 Multi-User Support
 Production Applications

 Integrate R analytics into Web based applications
 Data Analysis and Visualization
 Reporting
 Dashboards
 Interactive applications
 Revolution R Enterprise Server with RevoDeployR

23

E nterpris e-Wide Deployment Revolution Confidential

Production Research and Development

Revolution R Enterprise Server
+ Hadoop
+ IBM Netezza Data Scientists / Modelers
+ Windows HPC Server cluster

Management End-User Deployment
Console
Excel Web BI
RevoDeployR Server App

Web Services API
Analysts / Corporate Users

24

On-Demand A nalytic s with R evoDeployR

25

T he A dvanc ed A nalytic s S tac k Revolution Confidential

Deployment / Consumption

Advanced Analytics

ETL

Data / Infrastructure

“Open Analytics Stack” White Paper: bit.ly/lC43Kw
26


Wrapping Up

Why R ? Revolution Confidential

 Every data analysis technique at your fingertips
 Create beautiful and unique data visualizations
 Get better results faster
 Draw on the talents of data scientists worldwide
 R is hot, and growing fast

29

R evolution R E nterpris e Revolution Confidential

Production-Grade Statistical Analysis for the Workplace

 High-performance R for multiprocessor systems
 Modern Integrated Development Environment
 Statistical Analysis of Terabyte-Class Data Sets
 In-database R analytics with Hadoop and Netezza
 Deploy R Applications via Web Services
 Telephone and email technical support
 Training and consulting services
 100% compatible with R packages

30

R evolution R E nterpris e: F ree to A c ademia Revolution Confidential

 Personal use
 Research
 Teaching
 Package development

Free Academic Download
www.revolutionanalytics.com/downloads/free-academic.php
Discounted Technical Support Subscriptions Available

31

T hank You! Revolution Confidential

 Download slides, replay
 http://bit.ly/z9xUG9

 Learn more about Revolution R
 revolutionanalytics.com/products

 Contact Revolution Analytics
 http://bit.ly/hey-revo

Feb 29: Turbo-Charge Your Analytics with IBM Netezza and
Revolution R Enterprise
A Step-by-Step Approach for Acceleration and Innovation, presented by William
Zanine (IBM Analytics Solutions).

www.revolutionanalytics.com/news-events/free-webinars

32


P oll Ques tion
What interests you most about
Revolution R Enterprise?


The leading commercial provider of software and support for the
popular open source R statistics language.

www.revolutionanalytics.com
+1 (650) 646 9545
Twitter: @RevolutionR

34

Revolution R - 100% R and More

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Revolution R - 100% R and More

Similar to Revolution R - 100% R and More (20)

More from Revolution Analytics

More from Revolution Analytics (20)

Recently uploaded

Recently uploaded (20)

Revolution R - 100% R and More