Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- In-Database Analytics Deep Dive wit... by Revolution Analytics 2795 views
- Teradata Integrated Analytics by Teradata 4416 views
- Teradata introduction by Rameejmd 7929 views
- High Performance Predictive Analyti... by Hadoop Summit 10977 views
- Teradata by Teja Bheemanapally 3175 views
- Introducing Revolution R Open: Enha... by Revolution Analytics 6129 views

2,349 views

2,058 views

2,058 views

Published on

Published in:
Technology

No Downloads

Total views

2,349

On SlideShare

0

From Embeds

0

Number of Embeds

32

Shares

0

Downloads

103

Comments

0

Likes

3

No embeds

No notes for slide

- 1. 1877 Big Data Analytics on Teradata: An Introduction to Revolution R Enterprise Bill Jacobs Dir., Product Marketing, Revolution Analytics
- 2. Demystifying R What is R Why is it so popular? Is it only open source?
- 3. 3
- 4. Our view: Big Data meets Big Math = New Business Outcomes THE PERFECT STORM + Computing Power + Data + Pace of Business + Customer Expectations + Data Science + Computer Science + Management Science Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013 Better Business Decisions New Business Outcomes 4
- 5. Big Analytics Delivers Value from Big Data Volume Variety Velocity The three Vs of Big Data: The three V’s of Big Data Big Analytics: Maximizing Value, accommodating data Volatility, while assuring Veracity of insights 5 Confidential to Revolution Analytics
- 6. R Open Source - Language, Community, Collaboration - Robert Gentleman & Ross Ihaka, 1993 - Version 1.0 released 2000 - 2.5 Million Global Users - Over 4,800 add-on ―Packages‖ - Why R? R in Universities = New Talent WELCOME & INTRODUCTIONS Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013 6
- 7. R is Exploding in Popularity & Functionality Internet Discussion Package Growth Mean monthly traffic on email discussion list Number of R packages listed on CRAN 4,000 2500 R 2000 3,000 1500 2,000 Stata 1000 SAS 1,000 SPSS S-Plus 0 1995 2000 2005 500 0 2010 Web Site Popularity Scholarly Activity Number of links to main web site Google Scholar hits (’05-’09 CAGR) R 4,000 SAS 2,000 1,050 SPSS 900 S-Plus Stata 600 R SAS 46% -11% SPSS -27% S-Plus Stata Source: http://r4stats.com/popularity 0% 10% 7
- 8. R is exploding in popularity & functionality R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R “I’ve been astonished by the rate at which R has been adopted. Four years ago, everyone in my economics department [at the University of Chicago] was using Stata; now, as far as I can tell, R is the standard tool, and students learn it first.” Deputy Editor for New Products at Forbes 24% use R as primary tool “A key benefit of R is that it provides near-instant availability of new and experimental methods created by its user base — without waiting for the development/release cycle of commercial software. SAS recognizes the value of R to our customer base…” Source: www.rexeranalytics.com Product Marketing Manager SAS Institute, Inc
- 9. R Is The Most Commonly Used Primarly Analytics Tool 70% of data miners report using R 24% use R as primary tool Source: www.rexeranalytics.com Source: www.rexeranalytics.com
- 10. Example of advanced visualization with R Facebook Network Graphic 10
- 11. R Community, collaboration and breadth: CRAN task views (sub set of 4800+ packages) Source: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/ Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013 11
- 12. Key Big Data Challenge: The Analytics Talent Pool 12
- 13. The Analytics Talent Pool With R 2 Million R Users 13
- 14. R is open source and drives analytic innovation but….has some limitations for Enterprises Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commerci al Viability Risk of deployment of open source Commercial license Eliminate risk with open source 14
- 15. Our History & Our Future Revolution R Enterprise V1 through V6.1 Revolution R Enterprise V6.2 through V9 Revolution R Enterprise V10 through v11 NA Offices NYC Dallas Company Founding Relocate HQ to Palo Alto 250 Customers 2007 500 Customers 2013 Chapter 1 Capture Mindshare 1000 Customers 2015 Chapter 2 Mobilize with Market Focus Company Confidential – Do not distribute 2017 Chapter 3 Scalable Growth 15
- 16. Revolution Confidential 200+ Customer Stories Finance & Insurance Academic & Gov’t Healthcare & Life Sciences Digital Media & Retail Manufacturing & High Tech 16
- 17. Revolution Analytics - Overview We are the only provider of a commercial analytics platform based on the open source R statistical computing language. Distributed, high performance analytical algorithms Power Easier to build and deploy analytic applications Stable, scalable multi-platform with Productivity Enterprise Readiness Professional services enablement world-wide support World Wide Support Teams • Standard and Premium Programs • Technical Account Managers • Customer Success Managers Professional Services • Architecture planning • Systems Integration • Advanced analytic applications • Full life cycle projects 17
- 18. Customers Revolutionize their Business Power 4X performance 50M records scored daily “…we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong Scalability TB’s data from 200+ data sources 10’s thousands attributes 100’s millions of scores daily “We’ve been able to scale our solution to a problem that’s so big that most companies could not address it…..” - SVP Analytics, Kevin Lyons, eXelate Performance 2X data 2X attributes no impact on performance “We need a highperformance analytics …we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1] 19
- 19. Revolution R Enterprise What is Revolution R Enterprise? How does Revolution R Enterprise work with Teradata Database?
- 20. Revolution R Enterprise is…. the only big data big analytics platform based on open source R, the defacto statistical computing language for modern analytics High Performance, Scalable Analytics Portable Across Enterprise Platforms Easier to Build & Deploy Analytics 21
- 21. How is RRE Used? Discovering Patterns with Big Data Building Models Efficiently Flexibly Deploying Models to Consumers Customer segmentation Market basket analysis Social networking analysis Fraud detection Marketing attribution Sentiment analysis …and much more Customer lifetime value Pricing optimization Recommendation engines …and much more Credit risk Customer churn Propensity to buy Market risk Operational risk …and much more 22
- 22. Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform Big Data Big Analytics Ready – Enterprise readiness DevelopR ConnectR DeployR – High performance analytics – Multi-platform architecture – Data source integration – Development tools ScaleR – Deployment tools DistributedR 23
- 23. The Platform Step by Step: R Capabilities R+CRAN RevoR • Open source R interpreter • UPDATED R 3.0.2 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Available On: • • • • • • • • • • • PlatformTM LSFTM Linux® Microsoft® HPC Clusters Microsoft Azure Burst Windows® & Linux Servers Windows & Linux Workstations Teradata® Database IBM® Netezza® IBM BigInsightsTM Cloudera Hadoop® Hortonworks Hadoop Intel® Hadoop 24
- 24. Big Data Speed @ Scale with Revolution R Enterprise (RRE) In-Hadoop Execution First, we enhance and accelerate the Open Source R interpreter. In-Database Execution Parallelized User Code Parallelized Algorithms Multi-Core Processing Multi-Threaded Execution Memory Management Fast Math Libraries 25
- 25. Open Source R Performance: Multi-threaded Math Open Customers report 5-50x Source R Revolution R Enterprise performance improvements compared to Open Source R — without changing any code Computation (4-core laptop) Open Source R Revolution R Speedup Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable Linear Algebra1 General R Benchmarks2 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ 26
- 26. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors Available for: ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • NEW Tools for distributing customized algorithms across nodes • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC DistributedR • Distributed computing framework • Delivers portability across platforms Available on: • • • • • • • • Windows Servers Red Hat and NEW SuSE Linux Servers IBM Platform LSF Linux Microsoft HPC Clusters Microsoft Azure Burst NEW Teradata Database NEW Cloudera Hadoop NEW Hortonworks Hadoop 27
- 27. Big Data Speed @ Scale with Revolution R Enterprise (RRE) In-Hadoop Execution Second, we built a platform for hosting R with Big Data on a variety of massively parallel platforms. In-Database Execution Parallelized User Code Parallelized Algorithms Multi-Core Processing Multi-Threaded Execution Memory Management Fast Math Libraries 28
- 28. Revolution R Enterprise Powering Next Generation Analytics COMBINE INTERMEDIATE RESULTS 29
- 29. SAS HPA Speed comparison* Logistic Regression Rows of data 1 billion Parameters “just a few” Time 80 seconds Data location In memory Nodes 32 Cores 384 RAM 1,536 GB 1 billion Double 7 45% 44 seconds On disk 1/6th 5 5% 20 5% 80 GB Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a 20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM. Revolution R Enterprise Delivers Performance at 2% of the Cost *As published by SAS in HPC Wire, April 21, 2011 30
- 30. Analytics Layer: High Performance Big Data Analytics with ScaleR R Data Step Descriptive Statistics Statistical Tests Sampling Predictive Modeling Data Visualization Machine Learning Simulation 31
- 31. ScaleR: Fast Parallel External Memory Algorithms Data Prep, Distillation & Descriptive Analytics R Data Step Data import – Delimited, Fixed, SAS, SPSS, O BDC Variable creation & transformation Recode variables Factor variables Missing value handling Sort Merge Split Aggregate by category (means, sums) Use any of the functionality of the R language to transform and clean data row by row! Descriptive Statistics Min / Max Mean Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations Company Confidential – Do not distribute Statistical Tests Chi Square Test t-Test F-Test Plus 100’s of other tests available in R! Sampling Subsample (observations & variables) Random Sampling High quality, fast, parallel random number generators 32
- 32. ScaleR: Fast Parallel External Memory Algorithms Statistical Modeling Predictive Models Covariance, Correlation, Sums of Squares (cross product matrix for set variables) matrices Multiple Linear Regression Generalized Linear Models (GLM) - All exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions including: cauchit, identity, log, logit, probit. User defined distributions & link functions. Logistic Regression Classification & Regression Trees Decision Forests Predictions/scoring for models Residuals for all models Machine Learning Data Visualization Histogram Line Plot Lorenz Curve ROC Curves (actual data and predicted values) Plus numerous tools in R and ScaleR to generate big data visualizations Cluster Analysis K-Means Classification Decision Trees Decision Forests Simulation High quality, fast, parallel random number generators Use the rich functionality of R for simulations 33
- 33. The Power of Revolution R Enterprise Performance & Scalability ScaleR ScaleR Moves computation to data ScaleR V a l u e Moves computation to data Leverage CRAN ScaleR Labor saving power DistributedR Maximizes computation DistributedR Powerful divide & conquer DistributedR Effective memory utilization RevoR 3-50X faster Open Source Leverage latest innovation 34
- 34. Why Teradata And Revolution R Enterprise? Teradata User Demand Data Movement Penalty Growing New Analytics Requiring MPP Approach R Popularity Open Source Limitations Arrival of Teradata v14.10 35
- 35. + Revolution Analytics coupled with the Teradata Unified Data Architecture accelerates big data analytics using the widely-accepted R language. Available Today: Scalable R analytics on servers connected to Teradata High speed, parallel data transfer, 5x faster than RODBC Integrated parallel analytics solution Teradata Version 14.0 Upcoming Capabilities (4Q13) Parallel R in-database for big data analytics on Teradata R programmers can immediately build parallel R models completely in R Revolution parallel in-database algorithms exclusively available on Teradata Revolution R Enterprise 6.2 High-Speed TPT Connector Company Confidential Teradata Version 14.10 + Revolution R Enterprise V7 36
- 36. Introducing Revolution R Enterprise Version 7 on Teradata Database New Teradata Table Operators New Parallelized Algorithms In-Database Execution of Parallelized Algorithms Executes R Scripts From R Workstations or Servers Provides Orders of Magnitude Performance Gains Supports Multiple Platforms in UDA Available Late 2013 37
- 37. Revolution Analytics in the UDA UNIFIED DATA ARCHITECTURE With Revolution R Enterprise RODBC Seamless use of R analytics across the Teradata UDA 38
- 38. Transparent Parallelization of Analytical, Predictive Modeling and Machine Learning in Teradata HOW DOES IT WORK? 39
- 39. Understanding R’s Compute Workload R Script < 1% Computational Workload Breakdown Compute Burden from Script or Command Compute Burden from Algorithmic Computations Algorithms 99.xxx% 40
- 40. ScaleR PEMAs: High Performance Analytical Algorithms Users Script Calls ScaleR PEMA – No Unique Code or Setup for Parallelism – ScaleR Algorithms are “just another R package” – Using PEMAs is Transparent, Automatic, Fast and Scales Linearly PEMAs Transparently Parallelize Algorithm Execution – Parallelized Versions of Statistics, Predictive Modeling and Machine Learning Algorithms – PEMAs Transparently Distribute Computations Across AMPs – Results are Consolidated Into A Single Result Set – Provides Write Once Deploy Anywhere (WODA) Portability 41
- 41. Transparent Distributed Computing with RRE ScaleR Transparent to the Script Algorithm Starts A Master Process Master Identifies Environment In Revolution R Enterprise: Script Calls ScaleR PEMA Algorithm Executes Algorithm Returns to Script Script Continues Execution Threading? Cores? Chips? Distributed Nodes? Master Initializes Algorithm Prepares Instructions for Nodes Master Executes Table Operators In Each VAMP VAMPs process each data segment Table Operator runs in each VAMP Table Operator returns Intermediate Result Object (IRO) to master process Master Process Combines IROs Returns Consolidated Answer to Script 42
- 42. ScaleR PEMAs on Teradata: Transparent Distribution of R Analytics Desktops & Servers Revolution R Enterprise For Each Call to a ScaleR Algorithm: – One Request – Many Subtasks – One Answer Corporate Applications Revolution R Enterprise ODBC Teradata Database + Revolution R Enterprise Extended Stored Procedure Table Operators AMPs 43
- 43. Revolution R Enterprise Ecosystem Power of Integration SI / Service Deployment / Consumption MSP / DSP Advanced Analytics ETL Corios Data / Infrastructure 46
- 44. The Platform Step by Step: Tools & Deployment DevelopR DeployR • Freely-available R algorithms • Callable by RevoR • Embeddable in R scripts • Web services software development kit • Integrates R Into application infrastructures Available on: • Can be called by RevoR • Can be run singe-node using RevoR • Analyze large data using RDataStep package • Run on multiple nodes using rxEXEC package DevelopR DeployR Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with leading desktop & BI tools 47
- 45. DevelopR Integrated Development Environment Script with type ahead and code snippets Sophisticated debugging with breakpoints , variable values etc. Solutions window for organizing code and data Objects loaded in the R Environment Packages installed and loaded Object details http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm 48
- 46. Data Analysis DeployR R / Statistical Modeling Expert Deployment Expert Business Intelligence Seamless Mobile Web Apps Bring the power of R to any web enabled application Simple Leverage common APIs including JS, Java, .NET Scalable Robustly scale user and compute workloads Secure Cloud / SaaS Manage enterprise security with LDAP & SSO 49
- 47. Create Custom, On-Demand Analytical Apps Some Examples: On-demand sales forecasting Leveraging the power of R from Microsoft tools Real-time social media sentiment analysis 50
- 48. Alteryx and Revolution Analytics Making Predictive Analytics More Accessible and Scalable Empowering Analysts with Easy-to-Use Predictive Tools combined with the Leading R Platform Delivering Enterprise-Scale Predictive Analytics to Line of Business Analysts Enabling a Broader Audience to Harness the Universe of R 51
- 49. Summary. R is Hot. – Most Broadly Used Analytical Language – Its Popularity Addresses Critical Talent Gap – Vast Functionality Via CRAN – R Needs a Platform For Big Data Big Analytics Revolution Provides Enterprise-Capable Platforms for R. – High Performance. – Scalable via Transparent Distributed Execution – Portable – Write Once Deploy Anywhere - WODA – Commercial Support & Services Cut Project Risks Teradata + Revolution Provide a Robust Solution – Teradata provides stable, high-performane big data environment – Revolution provides speed, scale, portability and stability for the enterprise 52
- 50. Next steps? The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR 53
- 51. Thank You. 54

No public clipboards found for this slide

Be the first to comment