Revolution Confidential    Revolution R Enterprise for IBM Netezza1                                        © 2012 IBM Corp...
IBM Netezza with Revolution Analytics                            Revolution Confidential  High-performance, in-database a...
Revolution AnalyticsMarch 1, 2012                      © 2012 IBM Corporation
What is R?                                                       Revolution Confidential                                  ...
Revolution ConfidentialMost advanced statisticalanalysis software available                                 The professor...
R evolution R E nterpris e has the Open-S ourc e R E ngine at the c ore                                                   ...
Working with Revolution R    Enterprise for IBM NetezzaMarch 1, 2012                    © 2012 IBM Corporation
Revolution ConfidentialRevolution R Enterprise for IBM Netezzainside the IBM Netezza Architecture                     IBM ...
In-Database Paradigms for using R                                                Revolution Confidential                  ...
Access In-Database Language Support from RConfidential                                       Revolution                   ...
Open Source R Package Support                                Revolution Confidential           Horizontal             Vert...
Using Revolution R Enterprise with IBM NetezzaConfidential                                          Revolution            ...
Deploying Revolution R Enterprise to IBM Netezza                                          Revolution Confidential         ...
Revolution R Enterprise Client Configuration                                           Revolution Confidential  Revolutio...
IBM Netezza In-Database Analytics from Revolution R Confidential                                                 Revolutio...
nzR Package                                                                          Revolution Confidential       Basic ...
nzA Package                                                                          Revolution Confidential Data Manipul...
nzA Package                                                                 Revolution Confidential Data Transformations ...
nzA Package                                                                        Revolution Confidential       Classifi...
nzMatrix Package                                                                           Revolution Confidential Data M...
Demonstration                Using Revolution R                 with IBM NetezzaMarch 1, 2012                        © 201...
Revolution ConfidentialTurbo-C harge Your  A nalytic s with IB M  Netezza and R evolution  R E nterpris eP res ented by:De...
Us e C as e – C redit R is k             Revolution Confidential We have a dataset comprised of individuals  and their cr...
Modeling E xerc is e                 Revolution Confidential1.   Learning more about the data2.   Prepare the data for mod...
1. L earning more about the data                                                                                          ...
2. P repare the data for modeling            Revolution Confidential Split the data in to 70/30 Training/Test sets Trans...
3. F it models to the data                Revolution Confidential Build two different models to predict if an  individual...
4. Model P erformanc e                Revolution Confidential Examine confusion matrices to determine:   Training perfor...
Demo   Revolution Confidential
Summary Familiar environment for R Developers     – World-class productivity tools     – Enterprise class service, suppor...
C ontac t Us                                                                   Revolution Confidential                    ...
Upcoming SlideShare
Loading in...5
×

Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation

5,323

Published on

Everyone involved in high-stakes analytics wants power, speed and flexibility regardless of the size of the data set and complexity of the analysis. Trailblazing organizations that have deployed IBM Netezza Analytics with their IBM Netezza data warehouse appliances (TwinFin) with Revolution R Enterprise are getting all three.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,323
On Slideshare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
191
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation

  1. 1. Revolution Confidential Revolution R Enterprise for IBM Netezza1 © 2012 IBM Corporation
  2. 2. IBM Netezza with Revolution Analytics Revolution Confidential  High-performance, in-database analytics platform for Big Data – Massively parallel processing delivers 10-100x performance – Run analytics in-database and eliminate data movement – Scalable architecture fosters experimentation  Innovation with Advanced Analytics – Analytic modeling with most current statistical methods and 2,500+ open source packages  Enterprise ready advanced analytics software, services & support – Security, IDE, training, professional services – Web Services stack enables integration with front-end presentation layer 2 © 2012 IBM Corporation
  3. 3. Revolution AnalyticsMarch 1, 2012 © 2012 IBM Corporation
  4. 4. What is R? Revolution Confidential Download the White Paper R is Hot Data analysis software bit.ly/r-is-hot A programming language – Development platform designed by and for statisticians – Object-oriented: vector, matrix, model, … – Built-in libraries of algorithms An environment – Huge library of algorithms for data access, data manipulation, analysis and graphics An open-source software project – Free, open, and active A community – Thousands of contributors, 2 million users – Resources and help in every domain 4 © 2012 IBM Corporation
  5. 5. Revolution ConfidentialMost advanced statisticalanalysis software available The professor who invented analytic software forHalf the cost of the experts now wants to take it to the massescommercial alternatives2M+ Users Power2,500+ Applications Finance Statistics Life Sciences Predictive Manufacturing Analytics Productivity Retail Data Mining Telecom Enterprise Social Media Readiness Visualization Government 5
  6. 6. R evolution R E nterpris e has the Open-S ourc e R E ngine at the c ore Revolution Confidential 2,500 community packages and growing exponentially Multi-Threaded Technology Web Services Big Data Parallel Math Libraries Partners API Analysis Tools Revolution Technical Productivity Support Environment Open Source R Build Packages R Engine Assurance Language Libraries 6
  7. 7. Working with Revolution R Enterprise for IBM NetezzaMarch 1, 2012 © 2012 IBM Corporation
  8. 8. Revolution ConfidentialRevolution R Enterprise for IBM Netezzainside the IBM Netezza Architecture IBM Netezza Analytics 8 © 2012 IBM Corporation
  9. 9. In-Database Paradigms for using R Revolution Confidential  Examples  In-database Scoring – Family of apply functions which score – Customer lifetime value analytic models by using data – Credit score parallelism – Affinity – Underlying truism is that there is a fact – Good stock/bad stock that can be applied across all data  Big Data Analytics Big data analytics – Family of parallelized, in-database – Clustering of all data to determine analytics that have R wrappers and groupings work on entire data set – Models that are apply across a whole – Underlying truism exists across all data set – decision trees data – Data transformation – variable selection, correlation  Grouped by Row (tapply) Group – Data and Task Parallelism – Forecasting – by store, stock symbol, • Data flow technique to apply analytics to etc. naturally occurring groups of data using – Build model for each customer or non-parallelized analytics product or etc. – Underlying relationship in data is by a group 9 © 2012 IBM Corporation
  10. 10. Access In-Database Language Support from RConfidential Revolution SQL Java C Python Fortran C++ 10 © 2012 IBM Corporation
  11. 11. Open Source R Package Support Revolution Confidential Horizontal Vertical • Bayesian • Econometrics • Cluster • Experimental Design • Distributions • Computational Physics • Graphics • Clinical Trials 2500+ • Graphical Models • Environmetrics • Machine Learning community • Finance • Multivariate packages • Genetics • Natural Language Processing • Medical Imaging • Optimization • Pharmacokinetics • Robust Statistical • Phylogenetics Metrics • Psychometrics • Spatial • Social Sciences • Survival Analysis • Time Series 11 © 2012 IBM Corporation
  12. 12. Using Revolution R Enterprise with IBM NetezzaConfidential Revolution Business Intelligence, Excel or Third-Party Application HTTP RevoDeployR Server Web Services Interface for R Revolution R Enterprise - Workstation Revolution R Enterprise - Server RODBC R Packages integrate and RODBC & push analytics processing & nzODBC in-database nzODBC IBM Netezza Analytics Host IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics S-Blade S-Blade S-Blade S-Blade S-Blade 12 © 2012 IBM Corporation
  13. 13. Deploying Revolution R Enterprise to IBM Netezza Revolution Confidential •Remote terminal connection to Host •Create your R Script •Compile and Register your R Script as an AE (UDAP) •Execute SQL that will invoke the registered AE •Go back Revolution R Client to retrieve results and continue additional analysis IBM Netezza Analytics Host IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics S-Blade S-Blade S-Blade S-Blade S-Blade 13 © 2012 IBM Corporation
  14. 14. Revolution R Enterprise Client Configuration Revolution Confidential  Revolution R Enterprise  R Package Dependencies – Productivity Environment – RODBC – caTools – Tree – Bitops – E1071 – Rgl – Ca – MASS – XML  Netezza ODBC Drivers  ‘nz’ R Packages – nzA, nzR, nzMatrix 14 © 2012 IBM Corporation
  15. 15. IBM Netezza In-Database Analytics from Revolution R Confidential Revolution nzR nzA nzMatrix Package Package Package Encapsulation of Matrices Encapsulate database and Entry point to the and operations in Database expose “R”-like constructs nzAnalytics nz.matrix construct in R to access matrices in the database R data.frame = database table Explicitly parallelized Apply an R function to a row algorithms that run in R operations on of data or grouped rows of database nz.matrix translate to data matrix stored procedure operations 15 © 2012 IBM Corporation
  16. 16. nzR Package Revolution Confidential  Basic Functions  Sample Code Database Connection nzConnect #load packages nzConnectDSN library(nzr) SQL Execution nzQuery, nzScalarQuery #connect to a database via ODBC nzDeleteTable nzConnect("admin", "xyz", "127.0.0.1", "iclasstest") Data Management as.nz.data.frame #load the iris table nz.data.frame nzdf <- nz.data.frame("iris") Apply an R function nzApply nzTApply #run a nzTApply against the nz dataframe nzGroupedApply fun <- function(x) max(x[,1]) R Package Management nzInstallPackages nzTApply(nzdf, nzdf[,5], fun) nzIsPackageInstalled 16 © 2012 IBM Corporation
  17. 17. nzA Package Revolution Confidential Data Manipulation Moments nz.moments Quantiles nz.quantile, nz.quartile Outlier Detection nz.outliers Frequency Table nz.bitable Histogram nz.hist Pearsons Correlation nz.corr Spearmans Correlation nz.spearman.corr, nz.spearman.corr.s Covariance nz.cov, nz.cov.matrix Mutual Information nz.mutualinfo Chi-Square Test nzChisq.test, nz.chisq.test t -Test t.ls.test, t.me.test, t.pmd.test, t.umd.test Mann-Whitney-Wilcoxon Test nz.mww.test Wilcoxon Test nz.wilcoxon.test Canonical Correlation nz.canonical.corr One-Way ANOVA nzAnova, nz.anova.CRD.test, nz.anova.RBD.test Principal Component Analysis nzPCA Tree-Shaped Bayesian Networks nz.TBNet Apply, nz.TBNet Grow, nz.BigBNControl, nz.TBNet1g2p, nz.TBNet1g,nz.TBNet2g 17 © 2012 IBM Corporation
  18. 18. nzA Package Revolution Confidential Data Transformations Discretization nz.efdisc, nz.emdisc, nz.ewdisc Standardization and Normalization nz.std.norm Data Imputation nz.impute.data Model Diagnostics Misclassification Error nz.cerror Confusion Matrix nz.acc, nz.CMATRIX STATS Mean Absolute Error nz.mae Mean Square Error nz.mse Relative Absolute Error nz.rae Percentage Split nz.percentage.split Cross-Validation nz.cross.validation 18 © 2012 IBM Corporation
  19. 19. nzA Package Revolution Confidential  Classification  Clustering Naive Bayes nzNaiveBayes, K-Means Clustering nzKMeans, nz.kmeans, nz.naivebayes, nz.predict.kmeans nz.predict.naivebayes Divisive Clustering nz.divcluster, Decision Trees nzDecTree, nz.predict.divcluster nz.dectree, nz.grow.dectree, nz.print.dectree, nz.prune.dectree, nz.predict.dectree Nearest Neighbors nz.knn  Associative Rule Mining  Regression FP-Growth nz.fpgrowth, Linear Regression nzLm nz.prepare.fpgrowth Regression Trees nzRegTree, nz.regtree, nz.grow.regtree, nz.print.regtree, nz.predict.regtree 19 © 2012 IBM Corporation
  20. 20. nzMatrix Package Revolution Confidential Data Manipulation Coerce or point to a nz.matrix as.nz.matrix, as.nz.matrix.matrix, nz.matrix Combine Matrices nzCBind, nzRBind Create Matrices From Tables nzCreateMatrixFromTable, nzCreateTableFromMatrix Create Special Matrices nzIdentityMatrix, nzNormalMatrix, nzOnesMatrix, nzRandomMatrix, nzVecToDiag Decomposition nzSVD, svd, nzEigen Delete Matrices nzDeleteMatrix, nzDeleteMatrixByName Dimensions dim, NCOL, ncol, NROW, nrow Mathematical Functions abs, add, aubtr, ceiling, div, exp, floor, ln, log10, mod, mult, nzPowerMatrix, pow, rounding, sqrt, trunc Matrix Engine Initialization nzMatrixEngineInitialization Matrix Info is.nz.matrix, isSparse, nzExistMatrix, nzExistMatrixByName, nzGetValidMatrixName Operators *, +, -, <, ==, >, nzKronecker, nzPMax, nzPMin, nzSetValue, [, scale, t Printing Matrices print.nz.matrix Solve nzInv, nzSolve, nzSolveLLS Sparse Matrices isSparse, nzSparse2matrix Summaries nzAll, nzAny, nzMax, nzMin, nzSsq, nzSum, nzTr 20 © 2012 IBM Corporation
  21. 21. Demonstration Using Revolution R with IBM NetezzaMarch 1, 2012 © 2012 IBM Corporation
  22. 22. Revolution ConfidentialTurbo-C harge Your A nalytic s with IB M Netezza and R evolution R E nterpris eP res ented by:Derek M Norton, S enior S ales E ngineer
  23. 23. Us e C as e – C redit R is k Revolution Confidential We have a dataset comprised of individuals and their credit risk  stored on the Netezza Appliance The goal is to model if someone is “approvable” for a loan. This use case will follow a modeling process (though condensed) from start to finish. I will discuss each of the parts and at the end there will be a demo of the code
  24. 24. Modeling E xerc is e Revolution Confidential1. Learning more about the data2. Prepare the data for modeling3. Fit models to the data4. Model Performance
  25. 25. 1. L earning more about the data Revolution Confidential Connect to the IBM Netezza appliance Summarize the data Visualize the data Continuous Variable Discrete Varible 300 300 250 250Frequency 200 200 150 150 100 100 50 50 0 0 0 5 10 15 20 25 High School Diploma Bachelors Degree Masters Degree Professional Degree PhD x
  26. 26. 2. P repare the data for modeling Revolution Confidential Split the data in to 70/30 Training/Test sets Transform some variables  Discretize numeric variables for later use
  27. 27. 3. F it models to the data Revolution Confidential Build two different models to predict if an individual is “approvable”  Decision Tree  Naïve Bayes
  28. 28. 4. Model P erformanc e Revolution Confidential Examine confusion matrices to determine:  Training performance  Test performance
  29. 29. Demo Revolution Confidential
  30. 30. Summary Familiar environment for R Developers – World-class productivity tools – Enterprise class service, support and integration Execution of analytics in-database – Analytic computing distributed across Netezza nodes and run in a massively parallel manner – Each Netezza node gets a data slice and analytics are pushed down from the Host to the individual nodes Capabilities – R Code executed on Netezza nodes in row-by-row fashion or on groups of rows – Enables access to explicitly parallelized algorithms running on entire data set – Large-scale parallel matrix operations on database tables Performance – 10-100x Performance improvements 9 © 2012 IBM Corporation
  31. 31. C ontac t Us Revolution Confidential Bill Zanine Business Solutions Executive, Analytics Solutions IBM Netezza wzanine@us.ibm.com Derek Norton Solutions Executive Revolution Analytics derek.norton@revolutionanalytics.com www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×