• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation
 

Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation

on

  • 4,323 views

Everyone involved in high-stakes analytics wants power, speed and flexibility regardless of the size of the data set and complexity of the analysis. Trailblazing organizations that have deployed IBM ...

Everyone involved in high-stakes analytics wants power, speed and flexibility regardless of the size of the data set and complexity of the analysis. Trailblazing organizations that have deployed IBM Netezza Analytics with their IBM Netezza data warehouse appliances (TwinFin) with Revolution R Enterprise are getting all three.

Statistics

Views

Total Views
4,323
Views on SlideShare
2,811
Embed Views
1,512

Actions

Likes
1
Downloads
154
Comments
0

22 Embeds 1,512

http://www.revolutionanalytics.com 1382
http://feed.askmaclean.com 46
http://drsalbertspijkers.blogspot.nl 22
http://drsalbertspijkers.blogspot.com 17
http://localhost 12
http://yonniedev.devcloud.acquia-sites.com 6
http://drsalbertspijkers.blogspot.kr 4
http://drsalbertspijkers.blogspot.com.au 3
http://drsalbertspijkers.blogspot.sg 3
http://yonnietest.devcloud.acquia-sites.com 2
http://drsalbertspijkers.blogspot.ca 2
http://drsalbertspijkers.blogspot.in 2
http://drsalbertspijkers.blogspot.co.uk 2
http://drsalbertspijkers.blogspot.no 1
http://drsalbertspijkers.blogspot.ae 1
http://translate.googleusercontent.com 1
http://drsalbertspijkers.blogspot.de 1
http://drsalbertspijkers.blogspot.com.ar 1
http://drsalbertspijkers.blogspot.ie 1
http://drsalbertspijkers.blogspot.be 1
http://drsalbertspijkers.blogspot.jp 1
http://drsalbertspijkers.blogspot.com.es 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation Presentation Transcript

    • Revolution Confidential Revolution R Enterprise for IBM Netezza1 © 2012 IBM Corporation
    • IBM Netezza with Revolution Analytics Revolution Confidential  High-performance, in-database analytics platform for Big Data – Massively parallel processing delivers 10-100x performance – Run analytics in-database and eliminate data movement – Scalable architecture fosters experimentation  Innovation with Advanced Analytics – Analytic modeling with most current statistical methods and 2,500+ open source packages  Enterprise ready advanced analytics software, services & support – Security, IDE, training, professional services – Web Services stack enables integration with front-end presentation layer 2 © 2012 IBM Corporation
    • Revolution AnalyticsMarch 1, 2012 © 2012 IBM Corporation
    • What is R? Revolution Confidential Download the White Paper R is Hot Data analysis software bit.ly/r-is-hot A programming language – Development platform designed by and for statisticians – Object-oriented: vector, matrix, model, … – Built-in libraries of algorithms An environment – Huge library of algorithms for data access, data manipulation, analysis and graphics An open-source software project – Free, open, and active A community – Thousands of contributors, 2 million users – Resources and help in every domain 4 © 2012 IBM Corporation
    • Revolution ConfidentialMost advanced statisticalanalysis software available The professor who invented analytic software forHalf the cost of the experts now wants to take it to the massescommercial alternatives2M+ Users Power2,500+ Applications Finance Statistics Life Sciences Predictive Manufacturing Analytics Productivity Retail Data Mining Telecom Enterprise Social Media Readiness Visualization Government 5
    • R evolution R E nterpris e has the Open-S ourc e R E ngine at the c ore Revolution Confidential 2,500 community packages and growing exponentially Multi-Threaded Technology Web Services Big Data Parallel Math Libraries Partners API Analysis Tools Revolution Technical Productivity Support Environment Open Source R Build Packages R Engine Assurance Language Libraries 6
    • Working with Revolution R Enterprise for IBM NetezzaMarch 1, 2012 © 2012 IBM Corporation
    • Revolution ConfidentialRevolution R Enterprise for IBM Netezzainside the IBM Netezza Architecture IBM Netezza Analytics 8 © 2012 IBM Corporation
    • In-Database Paradigms for using R Revolution Confidential  Examples  In-database Scoring – Family of apply functions which score – Customer lifetime value analytic models by using data – Credit score parallelism – Affinity – Underlying truism is that there is a fact – Good stock/bad stock that can be applied across all data  Big Data Analytics Big data analytics – Family of parallelized, in-database – Clustering of all data to determine analytics that have R wrappers and groupings work on entire data set – Models that are apply across a whole – Underlying truism exists across all data set – decision trees data – Data transformation – variable selection, correlation  Grouped by Row (tapply) Group – Data and Task Parallelism – Forecasting – by store, stock symbol, • Data flow technique to apply analytics to etc. naturally occurring groups of data using – Build model for each customer or non-parallelized analytics product or etc. – Underlying relationship in data is by a group 9 © 2012 IBM Corporation
    • Access In-Database Language Support from RConfidential Revolution SQL Java C Python Fortran C++ 10 © 2012 IBM Corporation
    • Open Source R Package Support Revolution Confidential Horizontal Vertical • Bayesian • Econometrics • Cluster • Experimental Design • Distributions • Computational Physics • Graphics • Clinical Trials 2500+ • Graphical Models • Environmetrics • Machine Learning community • Finance • Multivariate packages • Genetics • Natural Language Processing • Medical Imaging • Optimization • Pharmacokinetics • Robust Statistical • Phylogenetics Metrics • Psychometrics • Spatial • Social Sciences • Survival Analysis • Time Series 11 © 2012 IBM Corporation
    • Using Revolution R Enterprise with IBM NetezzaConfidential Revolution Business Intelligence, Excel or Third-Party Application HTTP RevoDeployR Server Web Services Interface for R Revolution R Enterprise - Workstation Revolution R Enterprise - Server RODBC R Packages integrate and RODBC & push analytics processing & nzODBC in-database nzODBC IBM Netezza Analytics Host IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics S-Blade S-Blade S-Blade S-Blade S-Blade 12 © 2012 IBM Corporation
    • Deploying Revolution R Enterprise to IBM Netezza Revolution Confidential •Remote terminal connection to Host •Create your R Script •Compile and Register your R Script as an AE (UDAP) •Execute SQL that will invoke the registered AE •Go back Revolution R Client to retrieve results and continue additional analysis IBM Netezza Analytics Host IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics IBM Netezza Analytics S-Blade S-Blade S-Blade S-Blade S-Blade 13 © 2012 IBM Corporation
    • Revolution R Enterprise Client Configuration Revolution Confidential  Revolution R Enterprise  R Package Dependencies – Productivity Environment – RODBC – caTools – Tree – Bitops – E1071 – Rgl – Ca – MASS – XML  Netezza ODBC Drivers  ‘nz’ R Packages – nzA, nzR, nzMatrix 14 © 2012 IBM Corporation
    • IBM Netezza In-Database Analytics from Revolution R Confidential Revolution nzR nzA nzMatrix Package Package Package Encapsulation of Matrices Encapsulate database and Entry point to the and operations in Database expose “R”-like constructs nzAnalytics nz.matrix construct in R to access matrices in the database R data.frame = database table Explicitly parallelized Apply an R function to a row algorithms that run in R operations on of data or grouped rows of database nz.matrix translate to data matrix stored procedure operations 15 © 2012 IBM Corporation
    • nzR Package Revolution Confidential  Basic Functions  Sample Code Database Connection nzConnect #load packages nzConnectDSN library(nzr) SQL Execution nzQuery, nzScalarQuery #connect to a database via ODBC nzDeleteTable nzConnect("admin", "xyz", "127.0.0.1", "iclasstest") Data Management as.nz.data.frame #load the iris table nz.data.frame nzdf <- nz.data.frame("iris") Apply an R function nzApply nzTApply #run a nzTApply against the nz dataframe nzGroupedApply fun <- function(x) max(x[,1]) R Package Management nzInstallPackages nzTApply(nzdf, nzdf[,5], fun) nzIsPackageInstalled 16 © 2012 IBM Corporation
    • nzA Package Revolution Confidential Data Manipulation Moments nz.moments Quantiles nz.quantile, nz.quartile Outlier Detection nz.outliers Frequency Table nz.bitable Histogram nz.hist Pearsons Correlation nz.corr Spearmans Correlation nz.spearman.corr, nz.spearman.corr.s Covariance nz.cov, nz.cov.matrix Mutual Information nz.mutualinfo Chi-Square Test nzChisq.test, nz.chisq.test t -Test t.ls.test, t.me.test, t.pmd.test, t.umd.test Mann-Whitney-Wilcoxon Test nz.mww.test Wilcoxon Test nz.wilcoxon.test Canonical Correlation nz.canonical.corr One-Way ANOVA nzAnova, nz.anova.CRD.test, nz.anova.RBD.test Principal Component Analysis nzPCA Tree-Shaped Bayesian Networks nz.TBNet Apply, nz.TBNet Grow, nz.BigBNControl, nz.TBNet1g2p, nz.TBNet1g,nz.TBNet2g 17 © 2012 IBM Corporation
    • nzA Package Revolution Confidential Data Transformations Discretization nz.efdisc, nz.emdisc, nz.ewdisc Standardization and Normalization nz.std.norm Data Imputation nz.impute.data Model Diagnostics Misclassification Error nz.cerror Confusion Matrix nz.acc, nz.CMATRIX STATS Mean Absolute Error nz.mae Mean Square Error nz.mse Relative Absolute Error nz.rae Percentage Split nz.percentage.split Cross-Validation nz.cross.validation 18 © 2012 IBM Corporation
    • nzA Package Revolution Confidential  Classification  Clustering Naive Bayes nzNaiveBayes, K-Means Clustering nzKMeans, nz.kmeans, nz.naivebayes, nz.predict.kmeans nz.predict.naivebayes Divisive Clustering nz.divcluster, Decision Trees nzDecTree, nz.predict.divcluster nz.dectree, nz.grow.dectree, nz.print.dectree, nz.prune.dectree, nz.predict.dectree Nearest Neighbors nz.knn  Associative Rule Mining  Regression FP-Growth nz.fpgrowth, Linear Regression nzLm nz.prepare.fpgrowth Regression Trees nzRegTree, nz.regtree, nz.grow.regtree, nz.print.regtree, nz.predict.regtree 19 © 2012 IBM Corporation
    • nzMatrix Package Revolution Confidential Data Manipulation Coerce or point to a nz.matrix as.nz.matrix, as.nz.matrix.matrix, nz.matrix Combine Matrices nzCBind, nzRBind Create Matrices From Tables nzCreateMatrixFromTable, nzCreateTableFromMatrix Create Special Matrices nzIdentityMatrix, nzNormalMatrix, nzOnesMatrix, nzRandomMatrix, nzVecToDiag Decomposition nzSVD, svd, nzEigen Delete Matrices nzDeleteMatrix, nzDeleteMatrixByName Dimensions dim, NCOL, ncol, NROW, nrow Mathematical Functions abs, add, aubtr, ceiling, div, exp, floor, ln, log10, mod, mult, nzPowerMatrix, pow, rounding, sqrt, trunc Matrix Engine Initialization nzMatrixEngineInitialization Matrix Info is.nz.matrix, isSparse, nzExistMatrix, nzExistMatrixByName, nzGetValidMatrixName Operators *, +, -, <, ==, >, nzKronecker, nzPMax, nzPMin, nzSetValue, [, scale, t Printing Matrices print.nz.matrix Solve nzInv, nzSolve, nzSolveLLS Sparse Matrices isSparse, nzSparse2matrix Summaries nzAll, nzAny, nzMax, nzMin, nzSsq, nzSum, nzTr 20 © 2012 IBM Corporation
    • Demonstration Using Revolution R with IBM NetezzaMarch 1, 2012 © 2012 IBM Corporation
    • Revolution ConfidentialTurbo-C harge Your A nalytic s with IB M Netezza and R evolution R E nterpris eP res ented by:Derek M Norton, S enior S ales E ngineer
    • Us e C as e – C redit R is k Revolution Confidential We have a dataset comprised of individuals and their credit risk  stored on the Netezza Appliance The goal is to model if someone is “approvable” for a loan. This use case will follow a modeling process (though condensed) from start to finish. I will discuss each of the parts and at the end there will be a demo of the code
    • Modeling E xerc is e Revolution Confidential1. Learning more about the data2. Prepare the data for modeling3. Fit models to the data4. Model Performance
    • 1. L earning more about the data Revolution Confidential Connect to the IBM Netezza appliance Summarize the data Visualize the data Continuous Variable Discrete Varible 300 300 250 250Frequency 200 200 150 150 100 100 50 50 0 0 0 5 10 15 20 25 High School Diploma Bachelors Degree Masters Degree Professional Degree PhD x
    • 2. P repare the data for modeling Revolution Confidential Split the data in to 70/30 Training/Test sets Transform some variables  Discretize numeric variables for later use
    • 3. F it models to the data Revolution Confidential Build two different models to predict if an individual is “approvable”  Decision Tree  Naïve Bayes
    • 4. Model P erformanc e Revolution Confidential Examine confusion matrices to determine:  Training performance  Test performance
    • Demo Revolution Confidential
    • Summary Familiar environment for R Developers – World-class productivity tools – Enterprise class service, support and integration Execution of analytics in-database – Analytic computing distributed across Netezza nodes and run in a massively parallel manner – Each Netezza node gets a data slice and analytics are pushed down from the Host to the individual nodes Capabilities – R Code executed on Netezza nodes in row-by-row fashion or on groups of rows – Enables access to explicitly parallelized algorithms running on entire data set – Large-scale parallel matrix operations on database tables Performance – 10-100x Performance improvements 9 © 2012 IBM Corporation
    • C ontac t Us Revolution Confidential Bill Zanine Business Solutions Executive, Analytics Solutions IBM Netezza wzanine@us.ibm.com Derek Norton Solutions Executive Revolution Analytics derek.norton@revolutionanalytics.com www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR