Revolution ConfidentialR evolution R :100% R and MoreP res ented by:David S mithV P Marketing and C ommunityR evolution A nalytic s
Revolution ConfidentialP oll Ques tion Which stats package do you use most?
F ebruary 22, 2011: Welc ome! Revolution Confidential Thanks for coming. Slides and replay available (soon) at: http://bit.ly/z9xUG9 David Smith VP Marketing & Community, Revolution Analytics Editor, Revolutions blog http://blog.revolutionanalytics.com Twitter: @revodavid 3
In today’s webc as t: Revolution Confidential About Revolution Analytics and R What Revolution R adds to R Resources for getting more from R Q&A Introducing Revolution R 4
What is R ? Download the White PaperConfidential R is Hot Revolution bit.ly/r-is-hot Data analysis software A programming language Development platform designed by and for statisticians An environment Huge library of algorithms for data access, data manipulation, analysis and graphics An open-source software project Free, open, and active A community Thousands of contributors, 2 million users Resources and help in every domain 5
R is exploding in popularity andfunc tionality Revolution ConfidentialScholarly Activity Google Scholar hits (’05-’09 CAGR) R 46% “I’ve been astonished by the rate at which R has been adopted. Four years ago, SAS -11% everyone in my economics department [at SPSS -27% the University of Chicago] was using Stata; now, as far as I can tell, R is the S-Plus 0% standard tool, and students learn it first.” Stata 10% Deputy Editor for New Products at ForbesPackage Growth Number of R packages listed on CRAN “A key benefit of R is that it provides near- instant availability of new and experimental methods created by its user base — without waiting for the development/release cycle of commercial software. SAS recognizes the value of R to our customer base…” Product Marketing Manager SAS Institute, Inc. 2002 2004 2006 2008 2010 Source: http://r4stats.com/popularity 6
“ R is the mos t powerful & flexible s tatis tic al Revolution Confidentialprogramming language in the world” 1 Capabilities Sophisticated statistical analyses Predictive analytics Data visualization Applications Real-time trading MSFT [2009- Last 29.29 Finance 30 Risk assessment 25 Forecasting 20 Bio-technology 15 Drug development Social networks .. and more 1. Norman Nie, multiple interviews 7
From: The R EcosystemR Us er C ommunity bit.ly/R-ecosystem 8
Revolution ConfidentialP oll Ques tion If youre not using R today, what would you most like to use R for?
R evolution R E nterpris e is Revolution Confidential 10
R P roduc tivity E nvironment (Windows ) Revolution Confidential Script with type ahead and code Solutions window snippets for organizing code and data Sophisticated debugging with breakpoints , variable Objects values etc. loaded in the R Environment Packages Object installed and details loaded http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm 11
Interac tive Debugging Revolution Confidential One-click to set a breakpoint in an R script Step in/out/over, inspect variables Eliminate the edit -> browser -> repair cycle 12
P erformanc e: Multi-threaded Math Revolution Confidential Open Revolution R Source R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 327 sec 13.4 sec 23x Cholesky Factorization 31.3 sec 1.8 sec 17x Linear Discriminant Analysis 216 sec 74.6 sec 2x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ 13
T hree P aradigms for B ig Data Revolution Confidential Standard R engine is constrained by capacity and performance Revolution R Enterprise offers three methods for big data with R: Off-line: high-performance file-based analytics Off-line, parallel & distributed analytics On-line, in-database analytics Hadoop Netezza 14
R evolution R E nterpris e with R evoS c aleRB ig Data S tatis tic s in R Revolution Confidential www.revolutionanalytics.com/bigdataEvery US airlinedeparture and arrival,1987-2008File: AirlineData87to08.xdfRows: 123.5 millionVariables: 29Size on disk: 13.2Gb arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE) 15
R evoS c aleR : B ig Data algorithms Revolution Confidential Data processing (rxDataStep) Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor, rxCov, rxSSCP) Linear regressions (rxLinMod) Logistic regressions (rxLogit) K means clustering (rxKmeans) Predictions (scoring) (rxPredict) Custom distributed computing (RxExec) Revolution R Enterprise 16
R evoS c aleR – Dis tributed C omputing Revolution Confidential Compute • Portions of the data source are Data Node made available to each compute Partition (RevoScaleR) node • RevoScaleR on the master node Compute assigns a task to each compute Data Node node Partition (RevoScaleR) Master • Each compute node independently Node processes its data, and returns its Compute (RevoScaleR) intermediate results back to the Data Node master node Partition (RevoScaleR) • master node aggregates all of the intermediate results from each Compute compute node and produces the Data Node final result Partition (RevoScaleR) *Available now for Microsoft HPC Server Video demo: http://bit.ly/ugQ9KR 17
P latform-agnos tic B ig Data A nalytic s Revolution Confidential Set “compute context” to define hardware (one line of code) Native job-scheduler handles distribution, monitoring, failover etc. Same code runs on other supported architectures Just change compute context Supported architectures: Windows: Microsoft HPC Server Linux: Platform Computing LSF (coming 2012) 42 seconds instead of 6 minutes 18
A c ommon analytic platform ac ros s bigdata arc hitec tures Revolution Confidential Hadoop File Based In-database 19
In-Databas e E xec ution with IB M Netezza Revolution Confidential More info: http://bit.ly/R-Netezza 20
R and Hadoop Revolution Confidential Hadoop offers a scalable infrastructure for processing massive amounts of data Storage – HDFS, HBASE Distributed Computing - MapReduce R is a statistical programming language for developing advanced analytic applications Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, … The Rhadoop project makes it possible to write Big Data algorithms for Hadoop using the R language alone. 21
R evoC onnec tR for Hadoop Revolution Confidential Write Map-Reduce analytics using HBASE only R code with these R packages: HDFS rhdfs - R and HDFS R Thrift rhbase - R and HBASE Map or Reduce rmr - R and MapReduce Task rhbase rhdfs Node Revolution R More information at: Job Client bit.ly/r-hadoop Tracker rmr 22
E nterpris e R eadines s :R evolution R E nterpris e S erver Revolution Confidential Multi-User Support Production Applications Integrate R analytics into Web based applications Data Analysis and Visualization Reporting Dashboards Interactive applications Revolution R Enterprise Server with RevoDeployR 23
E nterpris e-Wide Deployment Revolution Confidential Production Research and Development Revolution R Enterprise Server + Hadoop + IBM Netezza Data Scientists / Modelers + Windows HPC Server cluster Management End-User Deployment Console Excel Web BI RevoDeployR Server App Web Services API Analysts / Corporate Users 24
On-Demand A nalytic s with R evoDeployR Revolution Confidential 25
T he A dvanc ed A nalytic s S tac k Revolution Confidential Deployment / Consumption Advanced Analytics ETL Data / Infrastructure “Open Analytics Stack” White Paper: bit.ly/lC43Kw 26
Revolution Confidential On-Call Technical Support Consulting Migration | Analytics | Applications | Validation Training R | Revolution R | Statistical Topics Systems Integration BI | ERP | Databases | Cloud 27
Why R ? Revolution Confidential Every data analysis technique at your fingertips Create beautiful and unique data visualizations Get better results faster Draw on the talents of data scientists worldwide R is hot, and growing fast 29
R evolution R E nterpris e Revolution ConfidentialProduction-Grade Statistical Analysis for the Workplace High-performance R for multiprocessor systems Modern Integrated Development Environment Statistical Analysis of Terabyte-Class Data Sets In-database R analytics with Hadoop and Netezza Deploy R Applications via Web Services Telephone and email technical support Training and consulting services 100% compatible with R packages 30
R evolution R E nterpris e: F ree to A c ademia Revolution Confidential Personal use Research Teaching Package development Free Academic Download www.revolutionanalytics.com/downloads/free-academic.php Discounted Technical Support Subscriptions Available 31
T hank You! Revolution Confidential Download slides, replay http://bit.ly/z9xUG9 Learn more about Revolution R revolutionanalytics.com/products Contact Revolution Analytics http://bit.ly/hey-revo Feb 29: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise A Step-by-Step Approach for Acceleration and Innovation, presented by William Zanine (IBM Analytics Solutions). www.revolutionanalytics.com/news-events/free-webinars 32
Revolution ConfidentialP oll Ques tion What interests you most about Revolution R Enterprise?
Revolution ConfidentialThe leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR 34
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.