Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Taking R Analytics to SQL and the Cloud

20,617 views

Published on

Presentation by Andrie de Vries to SQL Relay 6 October 2015

Published in: Technology

Taking R Analytics to SQL and the Cloud

  1. 1. 2 WHO The leading provider of advanced analytics software and services based on open source R, since 2007 WHAT REVOLUTION R: The enterprise-grade predictive analytics application platform based on the R language WHERE “This acquisition will help customers use advanced analytics within Microsoft data platforms“ -- Joseph Sirosh, CVP C+E
  2. 2. 3
  3. 3. • Situation • Complication • Critical question? • Answer
  4. 4. • A high level overview of R • Data science in the cloud • Connecting R to SQL • Scalable R • R in SQL Server • Moving your workflow to the cloud
  5. 5. A high level overview of R
  6. 6. • Most widely used data analysis software • Most powerful statistical programming language • Create beautiful and unique data visualizations • Thriving open-source community • Fills the talent gap www.revolutionanalytics.com/what-is-r
  7. 7. 1993 • Research project in Auckland, NZ 1995 • Open source 1997 • R-core 2000 • R-1.0.0 2003 • R Foundation 2004 • First UseR! 2009 • New York Times 2015 • R-3.2.0 • R Consortium 8 Photo credit: Robert Gentleman
  8. 8. The New York Times Interactive Features • Election Forecast • Dialect Quiz Data Journalism • NFL Draft Picks • Wealth distribution in USA
  9. 9. Data science in the Azure cloud
  10. 10. Trends
  11. 11. Software Revenues New License Revenues http://redmonk.com/sogrady/2013/11/21/selling-software/ 13
  12. 12. The Azure Cloud Operational Announced Central US Iowa West US California North Europe Ireland East US Virginia East US 2 Virginia US Gov Virginia North Central US Illinois US Gov Iowa South Central US Texas Brazil South Sao Paulo West Europe Netherlands China North * Beijing China South * Shanghai Japan East Saitama Japan West OsakaIndia West TBD India East TBD East Asia Hong Kong SE Asia Singapore Australia West Melbourne Australia East Sydney * Operated by 21Vianet
  13. 13. http://blog.revolutionanalytics.com/2015/06/r-build-keynote.html/
  14. 14. Connecting R to SQL
  15. 15. 21 mran.revolutionanalytics.com
  16. 16. Demo • Using ODBC to connect R to SQL
  17. 17. Solving the scalability problem with R
  18. 18. is…. the big data big analytics platform based on open source R
  19. 19. • Data import – Delimited, Fixed, SAS, SPSS, OBDC • Variable creation & transformation • Recode variables • Factor variables • Missing value handling • Sort, Merge, Split • Aggregate by category (means, sums) • Min / Max, Mean, Median (approx.) • Quantiles (approx.) • Standard Deviation • Variance • Correlation • Covariance • Sum of Squares (cross product matrix for set variables) • Pairwise Cross tabs • Risk Ratio & Odds Ratio • Cross-Tabulation of Data (standard tables & long form) • Marginal Summaries of Cross Tabulations • Chi Square Test • Kendall Rank Correlation • Fisher’s Exact Test • Student’s t-Test • Subsample (observations & variables) • Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics • Sum of Squares (cross product matrix for set variables) • Multiple Linear Regression • Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. • Covariance & Correlation Matrices • Logistic Regression • Classification & Regression Trees • Predictions/scoring for models • Residuals for all models Predictive Models • K-Means • Decision Trees • Decision Forests • Stochastic Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection • Stepwise Regression Linear, Logistic and GLM • Monte Carlo • Parallel Random Number Generation Combination • Using Revolution rxDataStep and rxExec functions to combine open source R with Revolution R • PEMA API
  20. 20. Demo • Using RRE to solve the scalability problem
  21. 21. R in SQL Server
  22. 22. Data Scientist Interact directly with data Built-in to SQL Server Data Developer/DBA Manage data and analytics together Example Solutions • Fraud detection • Salesforecasting • Warehouse efficiency • Predictive maintenance Relational Data Analytic Library T-SQL Interface Extensibility ? R RIntegration 010010 100100 010101 Microsoft Azure Machine Learning Marketplace New R scripts 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101 SQL Server 2016
  23. 23. • Use your preferred R IDE • Set compute context to SQL Server • Use RevoScaleR rx functions Run R script • Create stored procedure • Execute directly in SSMS query Create SQL query
  24. 24. Demo • Using RRE directly in SQL-Server
  25. 25. Demo • Running R inside a SQL stored procedure
  26. 26. 36
  27. 27. Moving your workflow to the cloud
  28. 28. Model in Cloud Model Model in SQL Server using Revolution R Model in SQL Server using Revolution R Model on a sample of data Model on a sample of data Score in cloud Score in cloud Score Score in SQL Server Score in SQL Server Score using R
  29. 29. Andrie de Vries Senior Programmer Manager R Community Projects @RevoAndrie adevries@microsoft.com

×