Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

R at Microsoft


Published on

Presenter: David Smith
Presented to SURF (Sydney R User Group), June 25 2015

Published in: Technology
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @
    Are you sure you want to  Yes  No
    Your message goes here
  • Good day. It’s my pleasure meeting you, and that you enjoying your day? Can you allowed me to introduce my self to you. My name is Kine Gaye . I will like to get acquainted with you. please I'll be glad if you write to me or send your email address direct at my private email address ( because i have some important thing i will like to discuss with you privately. Hope to hear from you soon. Kine.
    Are you sure you want to  Yes  No
    Your message goes here
  • This is real take it serious, who will believe that a herb can cure herpes, i navel believe that this will work i have spend a lot when getting drugs from the hospital to keep me healthy, what i was waiting for is death because i was broke, one day i hard about this great man who is well know of HIV and cancer cure, i decided to email him, unknowingly to me that this will be the end of the herpes in my body, he prepare the herb for me, and give me instruction on how to take it, at the end of the one month, he told me to go to the hospital for a check up, and i went, surprisingly after the test the doctor confirm me negative, i thought it was a joke, i went to other hospital was also negative, then i took my friend who was also herpes positive to the Dr Agumagu, after the treatment she was also confirm negative . He also have the herb to cure cancer. please i want every one with this virus to be free, that is why am dropping his email address, or do email him he is a great man. the government is also interested in this DR, thank you for saving my life, and I promise I will always testify for your good work call his number +233200116937
    Are you sure you want to  Yes  No
    Your message goes here

R at Microsoft

  1. 1. • Introduction to R • Applications of R at Microsoft • R Products at Microsoft • What’s coming for R at Microsoft • Q&A
  2. 2. April 6, 2015 “This acquisition will help customers use advanced analytics within Microsoft data platforms.“
  4. 4. • Most widely used data analysis software • Most powerful statistical programming language • Create beautiful and unique data visualizations • Thriving open-source community • Fills the talent gap
  5. 5. • 1993: Research project in Auckland, NZ • 1995: Released as open-source software • 1997: R core group formed • 2000: R 1.0.0 released • 2003: R Foundation formed in Austria • 2004: First international user conference • 2007: Revolution Analytics founded • 2009: New York Times article on R • 2013: Revolution R Open released • 2015: Microsoft acquires Revolution Analytics 7 Photo credit: Robert Gentleman
  6. 6. R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  7. 7. New York Times, June 25 2009 (3 hours after Michael Jackson’s death)
  9. 9. What happened? Why did it happen? What will happen? How can we make it happen? Traditional BI Advanced Analytics
  10. 10. • System monitoring & alerting • Capacity Planning
  11. 11. • TruSkill Matchmaking System • Player Churn • Game design • In-game purchase optimization • Fraud detection • Player communities
  13. 13. • Enhanced Open Source R distribution • Compatible with all R-related software • Multi-threaded for performance • Focus on reproducibility • Open source (GPLv2 license) • Available for Windows, Mac OS X, Ubuntu, Red Hat and OpenSUSE • Download from 15
  14. 14. • Built on latest R engine • 100% compatible with • Designed to work with RStudio 16
  15. 15. • Multithreaded library replaces standard BLAS/LAPACK algorithms • High-performance algorithms • Sequential  Parallel • No need to change any R code • Included with RRO binary distributions 17 More at Revolutions blog
  16. 16. Adapted from CC BY-NC 2.5
  17. 17. • Static CRAN mirror • Daily CRAN snapshots • Easily write and share scripts synced to a specific snapshot 19 CRAN RRDaily snapshots checkpoint package library(checkpoint) checkpoint("2014-09-17") CRAN mirror checkpoint server Midnight UTC
  18. 18. • Easy to use: add 2 lines to the top of each script • For the package author: • For a script collaborator: 20
  19. 19. • Download Revolution R Open • Learn about R and RRO • Daily CRAN snapshots • Explore Packages • Explore Task Views 21
  20. 20. Trends
  21. 21. R FOR BIG DATA
  22. 22. • Toolkits for data scientists and numerical analysts to create custom parallel and distributed algorithms • Mainly useful for “embarrassingly parallel” problems, where parallel components work with small amounts of data • Big Data Predictive Analytics mostly not embarrassingly parallel Details at 24
  23. 23. is…. the only big data big analytics platform based on open source R the defacto statistical computing language for modern analytics
  24. 24.  Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort, Merge, Split  Aggregate by category (means, sums)  Min / Max, Mean, Median (approx.)  Quantiles (approx.)  Standard Deviation  Variance  Correlation  Covariance  Sum of Squares (cross product matrix for set variables)  Pairwise Cross tabs  Risk Ratio & Odds Ratio  Cross-Tabulation of Data (standard tables & long form)  Marginal Summaries of Cross Tabulations  Chi Square Test  Kendall Rank Correlation  Fisher’s Exact Test  Student’s t-Test  Subsample (observations & variables)  Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.  Covariance & Correlation Matrices  Logistic Regression  Classification & Regression Trees  Predictions/scoring for models  Residuals for all models Predictive Models  K-Means  Decision Trees  Decision Forests  Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection  Stepwise Regression  Simulation (e.g. Monte Carlo)  Parallel Random Number Generation Combination New in v7.3  PEMA-R API  rxDataStep  rxExec Coming in v7.4
  25. 25. • ETL • Marketing channel data • Behavioral variables • Promotional data • Overlay data • Exploratory data analysis • Time-to-event models • GAM survival models • Scoring for inference • Scoring for prediction • 5 billion scores per day per retailer CUSTOM DATA FORMAT CUSTOM VARIABLES (PMML)
  26. 26. R IN THE CLOUD
  27. 27. • Exposing the expertise of data scientists as APIs • Bringing the utility of data science to applications • Addressing the Data Science talent gap
  28. 28. Azure: Huge infrastructure scale 19 Regions ONLINE…huge datacenter capacity around the world…and we’re growing  100+ datacenters  One of the top 3 networks in the world (coverage, speed, connections)  2 x AWS and 6x Google number of offered regions  G Series – Largest VM available in the market – 32 cores, 448GB Ram, SSD… Operational Announced Central US Iowa West US California North Europe Ireland East US Virginia East US 2 Virginia US Gov Virginia North Central US Illinois US Gov Iowa South Central US Texas Brazil South Sao Paulo West Europe Netherlands China North * Beijing China South * Shanghai Japan East Saitama Japan West OsakaIndia West TBD India East TBD East Asia Hong Kong SE Asia Singapore Australia West Melbourne Australia East Sydney * Operated by 21Vianet
  29. 29.
  31. 31. 40
  32. 32. Data Scientist Interact directly with data Built-in to SQL Server Data Developer/DBA Manage data and analytics together SQL Server 2016 Built-in in-database analytics Example Solutions • Fraud detection • Salesforecasting • Warehouse efficiency • Predictive maintenance Relational Data Analytic Library T-SQL Interface Extensibility ? R RIntegration 010010 100100 010101 Microsoft Azure Machine Learning Marketplace New R scripts 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101
  33. 33. rows minutes R on a server pulling data via SQL R on a server Invoking RRE ScaleR Inside the EDW
  34. 34. Thank you Download Revolution R Open: More at: David Smith R Community Lead Revolution Analytics @revodavid
  35. 35. 46 More at