100% R and More: Plus What's New in Revolution R Enterprise 6.0


Published on

R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.

VP of Product Development, Dr. Sue Ranney will also provide an overview of the features introduced in Revolution R Enterprise 6.0 including:

1. Big Data Generalized Linear Model, the new RevoScaleR function that provides a fast, scalable, distributable implementation of generalized linear models, offering impressive speed-ups relative to glm on in-memory data frames

2. Platform LSF Cluster Support, which allows you to create a distributed compute context for the Platform LSF workload manager

3. Azure Burst support added to RxHpcServer

4. Updated R engine (R 2.14.2)

5. Ability to use RevoScaleR analysis functions with non-xdf data sources such as SAS, SPSS or text

6. New methods for RxXdfData data sources including head, tail, names, dim, colnames, length, str, and formula

7. New function rxRoc for generating ROC curves

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

100% R and More: Plus What's New in Revolution R Enterprise 6.0

  1. 1. Revolution ConfidentialR evolution R E nterpris e 6100% R and MoreP res ented by:David S mithV P Marketing and C ommunityS ue R anneyV P P roduct Management Revolution Confidential
  2. 2. Revolution ConfidentialP oll Ques tion Which stats package do you use most?
  3. 3. In today’s webc as t: Revolution Confidential About Open-Source R and Revolution R Enterprise What’s New in Revolution R Enterprise 6 Resources, Q&A 3
  4. 4. What is R ? Download the White PaperConfidential R is Hot Revolution bit.ly/r-is-hot Data analysis software A programming language  Development platform designed by and for statisticians An environment  Huge library of algorithms for data access, data manipulation, analysis and graphics An open-source software project  Free, open, and active A community  Thousands of contributors, 2 million users  Resources and help in every domain 4
  5. 5. From: The R EcosystemR Us er C ommunity Revolution Confidential bit.ly/R-ecosystem 5
  6. 6. R evolution R E nterpris e is Revolution Confidential 6
  7. 7. R P roduc tivity E nvironment (Windows ) Revolution Confidential Script with type ahead and code Solutions window snippets for organizing code and data Sophisticated debugging with breakpoints , variable Objects values etc. loaded in the R Environment Packages Object installed and details loaded http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm 7
  8. 8. P erformanc e: Multi-threaded Math Revolution Confidential Open Revolution R Source R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ 8
  9. 9. A c ommon analytic platform ac ros s bigdata arc hitec tures Revolution Confidential File Based Hadoop In-database Cluster 9
  10. 10. R evoS c aleR on Dis tributed C omputing C lus ters Revolution Confidential(Windows HP C S erver, P latform L S F ) Compute Node Data Partition Compute Data Node Partition BIGData Master Node Partition Compute DATA Node Data Partition Compute Node Data Step, Statistical Summary, Tables/Cubes, Covariance, Linear & Logistic Regression, GLM, K-means clustering, … 10
  11. 11. S c alable dis tributed c omputing withR evolution R E nterpris e and Hadoop Revolution Confidential Map-Reduce RHadoop: http://bit.ly/RHadoop 11
  12. 12. In-Databas e E xec ution with IB M Netezza Revolution Confidential More info: http://bit.ly/R-Netezza 12
  13. 13. E nterpris e-Wide Deployment Revolution Confidential Production Research and Development Revolution R Enterprise Server + Hadoop + IBM Netezza Data Scientists / Modelers + Server cluster Management End-User Deployment Console Excel Web BI RevoDeployR Server App Web Services API Analysts / Corporate Users 13
  14. 14. Revolution Confidential On-Call Technical Support Consulting  Migration | Analytics | Applications | Validation Training  R | Revolution R | Statistical Topics Systems Integration  BI | ERP | Databases | Cloud www.revolutionanalytics.com/services 14
  15. 15. Why R evolution R ? Open-Source R RRE6 Workstation RRE6 Revolution Confidential Server Interface with multiple data sources ✓ ✓✓ ✓✓ Exploratory data analysis ✓✓ ✓✓ ✓✓ Wide range of statistical methods ✓✓ ✓✓ ✓✓ Parallel Programming ✓ ✓ ✓✓ Multi-threaded performance ✘ ✓ ✓✓ Big Data Analytics ✘ ✓ ✓✓ Distributed Analytics (Grid / Cluster) ✘ Client ✓✓ Cloud Computing ✘ ✘ ✓✓ Hadoop Integration ✘ Client ✓✓ IBM Netezza Integration ✘ Client ✓✓ Multi-user support ✘ ✘ ✓✓ Scheduled, monitored batch production ✘ ✘ ✓✓ Secure code deployment, management ✘ ✘ ✓✓ Integration into Data Apps ✘ ✘ ✓✓ http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php 15
  16. 16. Revolution ConfidentialP oll Ques tion What’s most important to you about Revolution R Enterprise?
  17. 17. Revolution ConfidentialWhat’s new inR evolution R E nterpris e 6P res ented by:S ue R anneyV P P roduct Development Revolution Confidential
  18. 18. R evolution R E nterpris e 6 Revolution Confidential Key Areas of Enhancements  Latest stable release of open-source R (2.14.2)  High Performance Analytics: Fast, scalable, distributable, full-featured analysis of huge data sets  High Performance Computing: Run arbitrary R functions in parallel across cores or nodes of a cluster
  19. 19. R 2.14.2 Revolution Confidential Incorporation of ‘parallel’ as base package  ‘foreach’ users can use doParallel backend  Users of RevoScaleR’s ‘rxExec’ HPC function can use new compute contexts to run arbitrary R functions in parallel  Compute context for the ‘parallel’ package  Compute context for any ‘foreach’ backend Standard functions and packages in R are pre- compiled into byte-code using ‘compiler’ package  The benefit in speed depends on the specific function but code’s performance can improve by a factor of 2x times or more.
  20. 20. High P erformanc e A nalytic s (HPA ) inR evoS c aleR Revolution Confidential High Performance Computing + Data Full-featured, fast, and scalable analysis functions Same code works on small and big data Same code works on a variety of compute contexts - a laptop, server, cluster, or the cloud Scales approximately linearly with the number of observations – without increasing memory requirements Revolution R Enterprise 20
  21. 21. Direc tly A nalyze E xternal Data S ets withR evoS c aleR HPA F unc tions NE W Revolution Confidential The RevoScaleR package provides easy ways to directly access and analyze external data sets (data sources)  Delimited ASCII  Fixed format ASCII  SAS data sets (.sas7bdat)  SPSS data sets (.sav)  ODBC connections No need to have SAS or SPSS installed to access data in SAS or SPSS file formats. Get started on analyses without first importing data Still have the option of importing into efficient .xdf file format Revolution R Enterprise 21
  22. 22. R evoS c aleR : HPA A lgorithms Revolution Confidential Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor, rxCov, rxSSCP) K means clustering (rxKmeans) Linear regressions (rxLinMod) Logistic regressions (rxLogit) Generalized Linear Models (rxGlm) NEW! Predictions (scoring) (rxPredict) Revolution R Enterprise 22
  23. 23. Tips for Handling B ig Data in R Revolution Confidential Use algorithms that process data in chunks.  The functions provided with RevoScaleR are scalable because they process data in ‘chunks.’  If the number of observations doubles, you can still perform the same data analyses with the same amount of memory – it will just take longer Use functions optimized for big data  The implementations of RevoScaleR analysis algorithms are all optimized for handling big data.  RevoScaleR analysis functions provide significant speed improvements over alternatives, even if you can fit all of your data in memory. Revolution R Enterprise 23
  24. 24. Revolution Confidential 24
  25. 25. B eyond In-Memory Data A nalys is Revolution Confidential RevoScaleR functions can read from data sets on disk in chunks, so you can increase the number of observations in the data set beyond what can be analyzed in memory all at once RevoScaleR analysis functions process chunks of data in parallel, taking greater advantage of your computing resources (Parallel External Memory Algorithms)  Multiple cores on a desktop/server  Cluster/grids have added advantage of more hard drives for storing & accessing data  Windows HPC Server Cluster  “Burst” computations to Azure in the cloud NEW  IBM Platform LSF Grid NEW Revolution R Enterprise 25
  26. 26. ‘B ig Data’ G eneralized L inear Models NE W Revolution Confidential Relaxes the assumptions for a standard linear model. Used in insurance, finance, biotech, and other industries. Example 1: Count data (Poisson)  Number of vehicles an auto policy holder owns  Number of credit cards a person holds  Number of bacterial colonies in a Petri dish Revolution R Enterprise 26
  27. 27. Revolution ConfidentialG L M: Other E xamples Example 2: Positive values with positive skew (Gamma)  Value of auto insurance claims for claims filed Example 3: Positive data that also contains exact zeros (Tweedie Model)  Data on insured vehicles (claims amount is zero for many vehicles; range of positive claims values for others)  Rainfall data Revolution R Enterprise 27
  28. 28. Revolution ConfidentialQuic k Demo Inc orporating rxG L M Use 5% Sample of the U.S. 2000 Census to look at annual property insurance premiums Data manipulations: sub-sample data and modify categorical data Perform summary statistics; draw histogram Estimate a Tweedie model using rxGlm Estimate predictions for targeted demographic characteristics Visualize the results Analyze bigger model using a cluster Revolution R Enterprise 28
  29. 29. C loud C omputing with A zure B urs t NE WRevolution Confidential Windows Azure is a cloud platform that enables you to manage computations across a global network of Microsoft-managed datacenters Revolution R Enterprise 6.0 can burst computations to Windows Azure from Windows HPC Server Particularly suited to parallel HPC such as simulations 29
  30. 30. A S imple S imulation E xample Revolution Confidential For each run:  Generate data with a known distribution (Using code that accompanies the article "Pure Premium Regression with the Tweedie Model" by Glenn Meyers, Actuarial Review, May 2009 )  Estimate the model using rxGLM Compare the means of the estimated coefficients with the known parameters of the underlying distribution Do a small number of runs locally Do a large number of runs ‘bursting’ to the Azure cloud (monitor jobs with HPC Job Scheduler, just as with on- premises nodes) Revolution R Enterprise 30
  31. 31. Revolution ConfidentialP oll Ques tion What new feature of Revolution R Enterprise 6 is most interesting to you?
  32. 32. T hank You! Revolution Confidential Download slides, replay from today’s webinar  http://bit.ly/z9xUG9 Learn more about Revolution R Enterprise  Overview: revolutionanalytics.com/products  New feature videos: http://www.revolutionanalytics.com/products/new-features.php Contact Revolution Analytics  http://bit.ly/hey-revo June 28: Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR David Humke, Vice President, The Northern Trust Company www.revolutionanalytics.com/news-events/free-webinars 32
  33. 33. Revolution ConfidentialThe leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR 33