  1. 1. Revolution ConfidentialR evolution R :100% R and MoreP res ented by:David S mithV P Marketing and C ommunityR evolution A nalytic s
  3. 3. F ebruary 22, 2011: Welc ome! Revolution Confidential Thanks for coming. Slides and replay available (soon) at:  David Smith VP Marketing & Community, Revolution Analytics Editor, Revolutions blog Twitter: @revodavid 3
  4. 4. In today’s webc as t: Revolution Confidential About Revolution Analytics and R What Revolution R adds to R Resources for getting more from R Q&A Introducing Revolution R 4
  5. 5. What is R ? Download the White PaperConfidential R is Hot Revolution Data analysis software A programming language  Development platform designed by and for statisticians An environment  Huge library of algorithms for data access, data manipulation, analysis and graphics An open-source software project  Free, open, and active A community  Thousands of contributors, 2 million users  Resources and help in every domain 5
  6. 6. R is exploding in popularity andfunc tionality Revolution ConfidentialScholarly Activity Google Scholar hits (’05-’09 CAGR) R 46% “I’ve been astonished by the rate at which R has been adopted. Four years ago, SAS -11% everyone in my economics department [at SPSS -27% the University of Chicago] was using Stata; now, as far as I can tell, R is the S-Plus 0% standard tool, and students learn it first.” Stata 10% Deputy Editor for New Products at ForbesPackage Growth Number of R packages listed on CRAN “A key benefit of R is that it provides near- instant availability of new and experimental methods created by its user base — without waiting for the development/release cycle of commercial software. SAS recognizes the value of R to our customer base…” Product Marketing Manager SAS Institute, Inc. 2002 2004 2006 2008 2010 Source: 6
  7. 7. “ R is the mos t powerful & flexible s tatis tic al Revolution Confidentialprogramming language in the world” 1 Capabilities  Sophisticated statistical analyses  Predictive analytics  Data visualization Applications  Real-time trading MSFT [2009-  Last 29.29 Finance 30  Risk assessment 25  Forecasting 20  Bio-technology 15  Drug development  Social networks  .. and more 1. Norman Nie, multiple interviews 7
  8. 8. From: The R EcosystemR Us er C ommunity 8
  10. 10. R evolution R E nterpris e is Revolution Confidential 10
  11. 11. R P roduc tivity E nvironment (Windows ) Revolution Confidential Script with type ahead and code Solutions window snippets for organizing code and data Sophisticated debugging with breakpoints , variable Objects values etc. loaded in the R Environment Packages Object installed and details loaded 11
  12. 12. Interac tive Debugging Revolution Confidential One-click to set a breakpoint in an R script Step in/out/over, inspect variables Eliminate the edit -> browser -> repair cycle 12
  13. 13. P erformanc e: Multi-threaded Math Revolution Confidential Open Revolution R Source R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 327 sec 13.4 sec 23x Cholesky Factorization 31.3 sec 1.8 sec 17x Linear Discriminant Analysis 216 sec 74.6 sec 2x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. 2. 13
  14. 14. T hree P aradigms for B ig Data Revolution Confidential Standard R engine is constrained by capacity and performance Revolution R Enterprise offers three methods for big data with R:  Off-line: high-performance file-based analytics  Off-line, parallel & distributed analytics  On-line, in-database analytics  Hadoop  Netezza 14
  15. 15. R evolution R E nterpris e with R evoS c aleRB ig Data S tatis tic s in R Revolution Confidential US airlinedeparture and arrival,1987-2008File: AirlineData87to08.xdfRows: 123.5 millionVariables: 29Size on disk: 13.2Gb arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE) 15
  16. 16. R evoS c aleR : B ig Data algorithms Revolution Confidential Data processing (rxDataStep) Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor, rxCov, rxSSCP) Linear regressions (rxLinMod) Logistic regressions (rxLogit) K means clustering (rxKmeans) Predictions (scoring) (rxPredict) Custom distributed computing (RxExec) Revolution R Enterprise 16
  17. 17. R evoS c aleR – Dis tributed C omputing Revolution Confidential Compute • Portions of the data source are Data Node made available to each compute Partition (RevoScaleR) node • RevoScaleR on the master node Compute assigns a task to each compute Data Node node Partition (RevoScaleR) Master • Each compute node independently Node processes its data, and returns its Compute (RevoScaleR) intermediate results back to the Data Node master node Partition (RevoScaleR) • master node aggregates all of the intermediate results from each Compute compute node and produces the Data Node final result Partition (RevoScaleR) *Available now for Microsoft HPC Server Video demo: 17
  18. 18. P latform-agnos tic B ig Data A nalytic s Revolution Confidential  Set “compute context” to define hardware (one line of code)  Native job-scheduler handles distribution, monitoring, failover etc.  Same code runs on other supported architectures  Just change compute context  Supported architectures:  Windows: Microsoft HPC Server  Linux: Platform Computing LSF (coming 2012) 42 seconds instead of 6 minutes 18
  19. 19. A c ommon analytic platform ac ros s bigdata arc hitec tures Revolution Confidential Hadoop File Based In-database 19
  20. 20. In-Databas e E xec ution with IB M Netezza Revolution Confidential More info: 20
  21. 21. R and Hadoop Revolution Confidential Hadoop offers a scalable infrastructure for processing massive amounts of data  Storage – HDFS, HBASE  Distributed Computing - MapReduce R is a statistical programming language for developing advanced analytic applications Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, … The Rhadoop project makes it possible to write Big Data algorithms for Hadoop using the R language alone. 21
  22. 22. R evoC onnec tR for Hadoop Revolution Confidential Write Map-Reduce analytics using HBASE only R code with these R packages: HDFS  rhdfs - R and HDFS R Thrift  rhbase - R and HBASE Map or Reduce  rmr - R and MapReduce Task rhbase rhdfs Node Revolution R More information at: Job Client Tracker rmr 22
  23. 23. E nterpris e R eadines s :R evolution R E nterpris e S erver Revolution Confidential Multi-User Support Production Applications Integrate R analytics into Web based applications  Data Analysis and Visualization  Reporting  Dashboards  Interactive applications Revolution R Enterprise Server with RevoDeployR 23
  24. 24. E nterpris e-Wide Deployment Revolution Confidential Production Research and Development Revolution R Enterprise Server + Hadoop + IBM Netezza Data Scientists / Modelers + Windows HPC Server cluster Management End-User Deployment Console Excel Web BI RevoDeployR Server App Web Services API Analysts / Corporate Users 24
  25. 25. On-Demand A nalytic s with R evoDeployR Revolution Confidential 25
  26. 26. T he A dvanc ed A nalytic s S tac k Revolution Confidential Deployment / Consumption Advanced Analytics ETL Data / Infrastructure “Open Analytics Stack” White Paper: 26
  27. 27. Revolution Confidential On-Call Technical Support Consulting  Migration | Analytics | Applications | Validation Training  R | Revolution R | Statistical Topics Systems Integration  BI | ERP | Databases | Cloud 27
  29. 29. Why R ? Revolution Confidential Every data analysis technique at your fingertips Create beautiful and unique data visualizations Get better results faster Draw on the talents of data scientists worldwide R is hot, and growing fast 29
  30. 30. R evolution R E nterpris e Revolution ConfidentialProduction-Grade Statistical Analysis for the Workplace  High-performance R for multiprocessor systems  Modern Integrated Development Environment  Statistical Analysis of Terabyte-Class Data Sets  In-database R analytics with Hadoop and Netezza  Deploy R Applications via Web Services  Telephone and email technical support  Training and consulting services  100% compatible with R packages 30
  31. 31. R evolution R E nterpris e: F ree to A c ademia Revolution Confidential  Personal use  Research  Teaching  Package development Free Academic Download Discounted Technical Support Subscriptions Available 31
  32. 32. T hank You! Revolution Confidential Download slides, replay  Learn more about Revolution R  Contact Revolution Analytics  Feb 29: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise A Step-by-Step Approach for Acceleration and Innovation, presented by William Zanine (IBM Analytics Solutions). 32
  34. 34. Revolution ConfidentialThe leading commercial provider of software and support for the popular open source R statistics language. +1 (650) 646 9545 Twitter: @RevolutionR 34