SlideShare a Scribd company logo
Scalable R
Michal Marušan, #RSlovakia #PyDataBA19. 6. 2017
(R <- Slovakia Meetup #2, Nervosa)
Scalable R
R Meetup
Michal Marusan @ 2017
What is R?
• How many of you use/know R?
• How many of you get data from database?
• How many of you have to deal with R limitations?
• How many of you know/use Hadoop/Spark?
100%
80%
60%
50%
?
?
Lack of
Commercial
Support
Single-
threaded /
Not
parallelized
Complex
Deployment
Processes
Production
Limited Data
Scale
in
Memory
Challenges posed by open source R
“This acquisition will help customers use advanced analytics within Microsoft data platforms.“
First step (April 6, 2015)
Microsoft R Products
• Free and open source R distribution
• Enhanced and distributed by Microsoft
Microsoft R Open / Client
• Built in Advanced Analytics and Stand Alone Server Capability
• Leverages the Benefits of SQL 2016 Enterprise Edition
SQL Server R Services
• Microsoft R Server for Redhat Linux
• Microsoft R Server for SUSE Linux
• Microsoft R Server for Teradata DB
• Microsoft R Server for Hadoop on Redhat
Microsoft R Server
Microsoft R products
how to tackle scale R?
multi-thread / parallel / memory
MRS extends open-source R to allow:
• Multi-threading
• Matrix operations, linear algebra, and many other math operations run on all
available cores
• Parallel processing
• ScaleR functions utilize all available resources, local or distributed
• On-disk data storage
• RAM limitations lifted
Microsoft R Server
R Open MicrosoftR Server
DeployR
ConnectR
• High-speed & direct
connectors
Available for:
• High-performance XDF
• SAS, SPSS, delimited & fixed
format text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBC
ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical tests
• Range of predictive functions
• User tools for distributing customized R algorithms
across nodes
• Wide data sets supported – thousands of variables
DistributedR
• Distributed computing framework
• Delivers cross-platform portability
R+CRAN
• Open source R interpreter
• R 3.1.2
• Freely-available huge range of R
algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing R scripts,
functions and packages
RevoR
• Performance enhanced R
interpreter
• Based on open source R
• Adds high-performance
math library to speed up
linear algebra functions
MRS Components
https://info.microsoft.com/rs/157-GQE-382/images/EN-WBNR-Slidedeck-Using-Microsoft-
Platforms
CRAN, MRO, MRS Comparison
Datasize
In-memory
In-memory In-Memory or Disk Based
Speed of Analysis
Single threaded* Multi-threaded*
Multi-threaded, parallel
processing 1:N servers
Support
Community Community Community + Commercial
Analytic Breadth
& Depth 7500+ innovative analytic
packages
7500+ innovative analytic
packages
7500+ innovative packages +
commercial parallel high-speed
functions
License
Open Source
Open Source
Commercial license.
Supported release with
indemnity
Microsoft
R Open
Microsoft
R Server
CRAN, MRO, MRS Comparison
Open source R
"http://www.ats.ucla.edu/stat/data/binary.csv"
R Server
“/data/binary.csv”
Demo
GLM
R Server parallelized by Spark
“/data/binary.csv”
R Server parallelized by SQL Server
“binary-table”
R Server platform options
Cloud
RDBMS
Servers
On-prem Hadoop
EDW
Write once – deploy anywhere
Desktops
Demo
WODA
rxComputeContext
High-Performance Compute & Analytics
• HPA – “Big Model” and/or “Big Data”
Not CommonCommon
• HPC – “Many Models”
Demo
rxExec
scale R for the enterprise use:
how do we operationalise R?
?
Some key operationalisation questions
How can we
embed R-based
analytics into
business
applications like
CRM?
How can we
build R-based
analytics into BI
dashboards and
deploy quickly?
How can we
maintain data
security for R
users?
How do we
allow data
scientists to
share code and
version control?
Microsoft R Open
Microsoft R Server
R IDE
Data Scientist
Workstation SQL Server 2016Script
Results
Execution
1
2
3
sqlCompute <- RxInSqlServer()
rxSetComputeContext(sqlCompute)
linModObj <- rxLinMod()
Advanced Analytics
Extensions
Model Development (R Users)
Working from R IDE on a local workstation, execute an R script that runs in-database on remote SQL
server, and get the results back.
Microsoft R Open
Microsoft R Server
Application
SQL Server 2016
2
exec sp_execute_external_script
@language = ‘R’
, @script =
-- R code --
Advanced Analytics
Extensions
Microsoft R Open
Mcirosoft R Server
Call System Stored Procedure1
Results: scores, plots3
The stored procedure
contains R code and
executes in-database
Model Operationalization with SQL R Services
Call a T-SQL System Stored Procedure to generate features and train (or retrain) the model
Call a T-SQL System Stored Procedure from my application and have it trigger R script execution in-
database to predict on new dataset. Results are then returned to my application (predictions, plots).
R Code->T-SQL Stored Proc
R script usage from SQL Server
Original R script:
IrisPredict <- function(data, model){
library(e1071)
predicted_species <- predict(model, data)
return(predicted_species)
}
library(RODBC)
conn <- odbcConnect("MySqlAzure", uid = myUser, pwd =
myPassword);
Iris_data <-sqlFetch(conn, "Iris_Data");
Iris_model <-sqlQuery(conn, "select model from my_iris_model");
IrisPredict (Iris_data, model);
Calling R script from SQL Server:
/* Input table schema */
create table Iris_Data (name varchar(100), length int, width int);
/* Model table schema */
create table my_iris_model (model varbinary(max));
declare @iris_model varbinary(max) = (select model from
my_iris_model);
exec sp_execute_external_script
@language = 'R'
, @script = '
IrisPredict <- function(data, model){
library(e1071)
predicted_species <- predict(model, data)
return(predicted_species)
}
IrisPredict(input_data_1, model);
'
, @parallel = default
, @input_data_1 = N'select * from Iris_Data'
, @params = N'@model varbinary(max)'
, @model = @iris_model
with result sets ((name varchar(100), length int, width int
, species varchar(30)));
Values highlighted in yellow are SQL queries embedded in the original R script
Values highlighted in aqua are R variables that bind to SQL variables by name
Advanced analytics
Why In-Database Analytics with SQL 2016 & R?
Leverage Full Capability of R:
• Rich Statistical, Visualization & Predictive Analytics
• A Large and Growing Skill Base
… including Microsoft R Servers Big Data Capabilities:
• Scalable Computation
• Scalable Data Size
… all Running In-Database:
• Divide Work Between Data Scientists and Data Engineers
• Reduce Data Duplication
• Reduce Data Movement
… While Protecting Information:
• Eliminate Data Movement & Unnecessary Copying
• Leverage Database Data Protections
+
SQL Server
2016
Enterprise
Edition
Copyright Microsoft Corporation. All rights reserved.
What’s new?
R Server 9.1
SQL Server Machine Learning Services
Best of Microsoft Innovation Best of Open Source
SQL
Server
Microsoft R Server 9.1 brings pre-trained cognitive models that accelerate
time to value and can be re-trained with your data and optimized for your
business.
Sentiment Analysis
Enables you to assess the
sentiment of an English
sentence/paragraph with just a
few lines of code.
Image Featurizer
Derive up to 5,000 features on a given image,
and use that to compare similarity between
two images.
This model can be used in healthcare,
research and quality control
For more information on Microsoft ML Libraries please visit: MicrosoftML Library
One of the most popular advanced analytics use cases is Pleasingly Parallel
(also known as Embarrassingly or Perfectly Parallel) where clients run massively
parallel computations on partitions that are grouped by one or more attributes.
These embarrassingly parallel use cases are common across industries:
• Life sciences simulations to identify the best drug for a given situation
• custom transformation, enrichment and featurization
• Portfolio analysis to identify the right investment for each portfolio
• Utilities to forecast energy consumption for each cohort
• Shipping to forecast demand for various container types
For detailed best practices visit: R Server Blog on embarrassingly-parallel
Easy, secure, and high-performance operationalization is essential for Tier-1 enterprises, at scale, to derive
maximum value from their analytics investments. Microsoft R Server 9.1 release continues strengthening the
power of operationalization.
- Real time web services: realize 10X to 100X boost in scoring performance, scoring speeds at <10ms.
Currently on Windows platform; other platforms will be supported soon.
- Role Based Access Control: enables admins to control who can publish, update, delete or consume web
services
- Asynchronous batch processing: speed up the scoring performance for the web services with large input
data sets and long-running jobs
- Asynchronous remote execution: run scripts in background mode on a remote server, without having to
wait for the job to complete
- Dynamic scaling of operationalization grid with Azure VMs: easily spin up a set of R Server VMs in
Azure, configure them as a grid for operationalization, and scale it up and down based on CPU / Memory
usage
• Support for R and now Python
languages
• 80+ of the most popular Python
packages are parallelized and
scalable to help tackle big data
• Enables users to work with
preferred tools and push
intelligence to where data lives
• Advanced machine learning
algorithms with GPUs
Predicting loan defaults
Store
Predictions
In-memory
OLTP
ColumnStore Power BI
dashboard
R
Business user
Prepare
for analytics
Visualize
Data from 8M
vehicle loans
Predicts loan
defaults
Age, original balance, interest rate,
loan remaining months, credit score
1M
predictions
per sec
Python
Real-time scoring
• Real-time scoring supported on models trained using RevoScaleR and MicrosoftML algorithms
& transforms.
• SQL Server understands these models natively and scores inputs without the need of R
interpreter and overhead delivering significantly better performance, assured reliability, lower
resource consumption.
Flexible R package management
• Updated RevoScaleR package enables users to install, uninstall and manage packages on SQL
Server without administrative access to the SQL Server machine.
• Data scientists and other non-admin users can install packages in databases, user or group
scope.
• Updated rxSyncPackages API to ensure that the user-installed packages are not lost if SQL
Server node goes down or if the database is migrated.
• The list of packages and the permissions is maintained in a server table and this API ensures
that the required packages are installed on the file system
Wrap Up
MRS extends open-source R to allow:
• Multi-threading
• Matrix operations, linear algebra, and many other math operations run on all available cores
• Parallel processing
• ScaleR functions utilize all available resources (rxExec)
• local or distributed (locally parallel, Hadoop, Spark, SQL Server, Teradata)
• On-disk data storage
• RAM limitations lifted -> XDF, XDFd
Q&A?
Thank you!
Michal.Marusan@Microsoft.com

More Related Content

What's hot

Modernization sql server 2016
Modernization   sql server 2016Modernization   sql server 2016
Modernization sql server 2016
Kiki Noviandi
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution Analytics
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
Vinoth Chandar
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
DataWorks Summit/Hadoop Summit
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
DataWorks Summit/Hadoop Summit
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
Sangjin Lee
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Precisely
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
Hortonworks
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)E. Balauca
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 

What's hot (20)

Modernization sql server 2016
Modernization   sql server 2016Modernization   sql server 2016
Modernization sql server 2016
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 

Similar to Michal Marušan: Scalable R

Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
MSDEVMTL
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
BAINIDA
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
Łukasz Grala
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Rui Quintino
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
Revolution Analytics
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
Gregg Barrett
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
Mark Tabladillo
 
Ml2
Ml2Ml2
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
Łukasz Grala
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
Revolution Analytics
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
Alex Palamides
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Travis Wright
 
Using R services with Machine Learning
Using R services with Machine LearningUsing R services with Machine Learning
Using R services with Machine Learning
Eng Teong Cheah
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
Revolution Analytics
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Andy Lathrop
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2Revolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Revolution Analytics
 

Similar to Michal Marušan: Scalable R (20)

Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Ml2
Ml2Ml2
Ml2
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
 
Using R services with Machine Learning
Using R services with Machine LearningUsing R services with Machine Learning
Using R services with Machine Learning
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

Michal Marušan: Scalable R

  • 1. Scalable R Michal Marušan, #RSlovakia #PyDataBA19. 6. 2017 (R <- Slovakia Meetup #2, Nervosa)
  • 2. Scalable R R Meetup Michal Marusan @ 2017
  • 3. What is R? • How many of you use/know R? • How many of you get data from database? • How many of you have to deal with R limitations? • How many of you know/use Hadoop/Spark? 100% 80% 60% 50%
  • 5. “This acquisition will help customers use advanced analytics within Microsoft data platforms.“ First step (April 6, 2015)
  • 6. Microsoft R Products • Free and open source R distribution • Enhanced and distributed by Microsoft Microsoft R Open / Client • Built in Advanced Analytics and Stand Alone Server Capability • Leverages the Benefits of SQL 2016 Enterprise Edition SQL Server R Services • Microsoft R Server for Redhat Linux • Microsoft R Server for SUSE Linux • Microsoft R Server for Teradata DB • Microsoft R Server for Hadoop on Redhat Microsoft R Server Microsoft R products
  • 7. how to tackle scale R? multi-thread / parallel / memory
  • 8. MRS extends open-source R to allow: • Multi-threading • Matrix operations, linear algebra, and many other math operations run on all available cores • Parallel processing • ScaleR functions utilize all available resources, local or distributed • On-disk data storage • RAM limitations lifted Microsoft R Server
  • 9. R Open MicrosoftR Server DeployR ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Range of predictive functions • User tools for distributing customized R algorithms across nodes • Wide data sets supported – thousands of variables DistributedR • Distributed computing framework • Delivers cross-platform portability R+CRAN • Open source R interpreter • R 3.1.2 • Freely-available huge range of R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages RevoR • Performance enhanced R interpreter • Based on open source R • Adds high-performance math library to speed up linear algebra functions MRS Components https://info.microsoft.com/rs/157-GQE-382/images/EN-WBNR-Slidedeck-Using-Microsoft- Platforms
  • 10. CRAN, MRO, MRS Comparison Datasize In-memory In-memory In-Memory or Disk Based Speed of Analysis Single threaded* Multi-threaded* Multi-threaded, parallel processing 1:N servers Support Community Community Community + Commercial Analytic Breadth & Depth 7500+ innovative analytic packages 7500+ innovative analytic packages 7500+ innovative packages + commercial parallel high-speed functions License Open Source Open Source Commercial license. Supported release with indemnity Microsoft R Open Microsoft R Server CRAN, MRO, MRS Comparison
  • 14. R Server parallelized by Spark “/data/binary.csv”
  • 15. R Server parallelized by SQL Server “binary-table”
  • 16. R Server platform options Cloud RDBMS Servers On-prem Hadoop EDW Write once – deploy anywhere Desktops
  • 18. High-Performance Compute & Analytics • HPA – “Big Model” and/or “Big Data” Not CommonCommon • HPC – “Many Models”
  • 20. scale R for the enterprise use: how do we operationalise R?
  • 21. ? Some key operationalisation questions How can we embed R-based analytics into business applications like CRM? How can we build R-based analytics into BI dashboards and deploy quickly? How can we maintain data security for R users? How do we allow data scientists to share code and version control?
  • 22. Microsoft R Open Microsoft R Server R IDE Data Scientist Workstation SQL Server 2016Script Results Execution 1 2 3 sqlCompute <- RxInSqlServer() rxSetComputeContext(sqlCompute) linModObj <- rxLinMod() Advanced Analytics Extensions Model Development (R Users) Working from R IDE on a local workstation, execute an R script that runs in-database on remote SQL server, and get the results back. Microsoft R Open Microsoft R Server
  • 23. Application SQL Server 2016 2 exec sp_execute_external_script @language = ‘R’ , @script = -- R code -- Advanced Analytics Extensions Microsoft R Open Mcirosoft R Server Call System Stored Procedure1 Results: scores, plots3 The stored procedure contains R code and executes in-database Model Operationalization with SQL R Services Call a T-SQL System Stored Procedure to generate features and train (or retrain) the model Call a T-SQL System Stored Procedure from my application and have it trigger R script execution in- database to predict on new dataset. Results are then returned to my application (predictions, plots). R Code->T-SQL Stored Proc
  • 24. R script usage from SQL Server Original R script: IrisPredict <- function(data, model){ library(e1071) predicted_species <- predict(model, data) return(predicted_species) } library(RODBC) conn <- odbcConnect("MySqlAzure", uid = myUser, pwd = myPassword); Iris_data <-sqlFetch(conn, "Iris_Data"); Iris_model <-sqlQuery(conn, "select model from my_iris_model"); IrisPredict (Iris_data, model); Calling R script from SQL Server: /* Input table schema */ create table Iris_Data (name varchar(100), length int, width int); /* Model table schema */ create table my_iris_model (model varbinary(max)); declare @iris_model varbinary(max) = (select model from my_iris_model); exec sp_execute_external_script @language = 'R' , @script = ' IrisPredict <- function(data, model){ library(e1071) predicted_species <- predict(model, data) return(predicted_species) } IrisPredict(input_data_1, model); ' , @parallel = default , @input_data_1 = N'select * from Iris_Data' , @params = N'@model varbinary(max)' , @model = @iris_model with result sets ((name varchar(100), length int, width int , species varchar(30))); Values highlighted in yellow are SQL queries embedded in the original R script Values highlighted in aqua are R variables that bind to SQL variables by name Advanced analytics
  • 25. Why In-Database Analytics with SQL 2016 & R? Leverage Full Capability of R: • Rich Statistical, Visualization & Predictive Analytics • A Large and Growing Skill Base … including Microsoft R Servers Big Data Capabilities: • Scalable Computation • Scalable Data Size … all Running In-Database: • Divide Work Between Data Scientists and Data Engineers • Reduce Data Duplication • Reduce Data Movement … While Protecting Information: • Eliminate Data Movement & Unnecessary Copying • Leverage Database Data Protections + SQL Server 2016 Enterprise Edition Copyright Microsoft Corporation. All rights reserved.
  • 26. What’s new? R Server 9.1 SQL Server Machine Learning Services
  • 27. Best of Microsoft Innovation Best of Open Source SQL Server
  • 28. Microsoft R Server 9.1 brings pre-trained cognitive models that accelerate time to value and can be re-trained with your data and optimized for your business. Sentiment Analysis Enables you to assess the sentiment of an English sentence/paragraph with just a few lines of code. Image Featurizer Derive up to 5,000 features on a given image, and use that to compare similarity between two images. This model can be used in healthcare, research and quality control For more information on Microsoft ML Libraries please visit: MicrosoftML Library
  • 29. One of the most popular advanced analytics use cases is Pleasingly Parallel (also known as Embarrassingly or Perfectly Parallel) where clients run massively parallel computations on partitions that are grouped by one or more attributes. These embarrassingly parallel use cases are common across industries: • Life sciences simulations to identify the best drug for a given situation • custom transformation, enrichment and featurization • Portfolio analysis to identify the right investment for each portfolio • Utilities to forecast energy consumption for each cohort • Shipping to forecast demand for various container types For detailed best practices visit: R Server Blog on embarrassingly-parallel
  • 30. Easy, secure, and high-performance operationalization is essential for Tier-1 enterprises, at scale, to derive maximum value from their analytics investments. Microsoft R Server 9.1 release continues strengthening the power of operationalization. - Real time web services: realize 10X to 100X boost in scoring performance, scoring speeds at <10ms. Currently on Windows platform; other platforms will be supported soon. - Role Based Access Control: enables admins to control who can publish, update, delete or consume web services - Asynchronous batch processing: speed up the scoring performance for the web services with large input data sets and long-running jobs - Asynchronous remote execution: run scripts in background mode on a remote server, without having to wait for the job to complete - Dynamic scaling of operationalization grid with Azure VMs: easily spin up a set of R Server VMs in Azure, configure them as a grid for operationalization, and scale it up and down based on CPU / Memory usage
  • 31.
  • 32. • Support for R and now Python languages • 80+ of the most popular Python packages are parallelized and scalable to help tackle big data • Enables users to work with preferred tools and push intelligence to where data lives • Advanced machine learning algorithms with GPUs Predicting loan defaults Store Predictions In-memory OLTP ColumnStore Power BI dashboard R Business user Prepare for analytics Visualize Data from 8M vehicle loans Predicts loan defaults Age, original balance, interest rate, loan remaining months, credit score 1M predictions per sec Python
  • 33. Real-time scoring • Real-time scoring supported on models trained using RevoScaleR and MicrosoftML algorithms & transforms. • SQL Server understands these models natively and scores inputs without the need of R interpreter and overhead delivering significantly better performance, assured reliability, lower resource consumption. Flexible R package management • Updated RevoScaleR package enables users to install, uninstall and manage packages on SQL Server without administrative access to the SQL Server machine. • Data scientists and other non-admin users can install packages in databases, user or group scope. • Updated rxSyncPackages API to ensure that the user-installed packages are not lost if SQL Server node goes down or if the database is migrated. • The list of packages and the permissions is maintained in a server table and this API ensures that the required packages are installed on the file system
  • 34. Wrap Up MRS extends open-source R to allow: • Multi-threading • Matrix operations, linear algebra, and many other math operations run on all available cores • Parallel processing • ScaleR functions utilize all available resources (rxExec) • local or distributed (locally parallel, Hadoop, Spark, SQL Server, Teradata) • On-disk data storage • RAM limitations lifted -> XDF, XDFd