SQL Server 2017 Machine Learning Services

•

1 like•420 views

SQL Server Machine Learning Services is an embedded, predictive analytics and data science engine that can execute R and Python code within a SQL Server database as stored procedures, as T-SQL script containing R or Python statements, or as R or Python code containing T-SQL. The key value proposition of Machine Learning Services is the power of its proprietary packages to deliver advanced analytics at scale, and the ability to bring calculations and processing to where the data resides, eliminating the need to pull data across the network.

Technology

SQL Server
ML Services
Linux
Hadoop Teradata
Windows
CommercialCommunity
R ServerR Open

Installed Packages
Base
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base
Recommended
- boot
- class
- cluster
- codetools
- foreign
- kernSmooth
- lattice
- MASS
- Matrix
- mgcv
- nlme
- nnet
- rpart
- spatial
- survival
Microsoft
(Developed /
Maintained)
- checkpoint
- deployRserve
- doParallel
- foreach
- jsonlite
- iterators
- microsoftR
- RevoIOQ
- RevoMods
- RevoUtils
- RODBC
- RevoUtilsMath
- azureml
- rmr2
- rhdfs
- rhbase
- plyrmr
Open-Source #1
Additional
CRAN R
- curl
- jsonlite
- png
- R6
- RODBC
Microsoft R Open #2
(Intel MKL)
Microsoft R Server #4
Microsoft R Client (free) #3
Microsoft
(Developed /
Maintained)
- RevoScaleR
- MicrosoftML
- CompatibilityAPI
- mrupdate
- RevoIOQ
- RevoTreeView
- Mrsdeploy
- Sqlrutils
- olapR
Commercially licenced & supported
Open-Source
Open-Source

Linux, Windows, Hadoop & Teradata
R Server Technology

Algorithm
Master
Predictive
Algorithm
Big
Data
Analyze
Blocks In
Parallel
Load Block
At A Time
Distribute Work,
Compile Results
“Pack and Ship”
Requests to
Remote
Environments
Results
Copyright Microsoft Corporation. All rights reserved.
Microsoft R Server “Client” Microsoft R Server “Server”
Console
R IDE or
command-
line REMOTE
CONTEXT

DI
R+CRAN
MicrosoftR
DistributedR
DeployR DevelopR
ScaleR
ConnectR
• Cloudera
• Hortonworks
• MapR
• Apache Spark
• IBM Platform LSF
• Microsoft HPC
Clusters
• SQL Server
• Teradata
Database
• Red Hat
• SuSE Servers
• Windows
DistributeR

### SETUP HADOOP ENVIRONMENT VARIABLES ###
myHadoopCC <- RxHadoopMR()
### HADOOP COMPUTE CONTEXT ###
rxSetComputeContext(myHadoopCC)
### CREATE HDFS, DIRECTORY AND FILE OBJECTS ###
hdfsFS <- RxHdfsFileSystem()
hdfsFS
### ANALYTICAL PROCESSING ###
### Statistical Summary of the data
rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)
### CrossTab the data
rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)
### Linear Model and plot
hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet)
plot(hdfsXdfArrLateLinMod$coefficients)
### SETUP LOCAL ENVIRONMENT VARIABLES ###
myLocalCC <- “localpar”
### LOCAL COMPUTE CONTEXT ###
rxSetComputeContext(myLocalCC)
### CREATE LINUX, DIRECTORY AND FILE OBJECTS ###
localFS <- RxNativeFileSystem()
AirlineDataSet <- RxXdfData(“AirlineDemoSmall.xdf”,
fileSystem = localFS)
Local Parallel processing – Linux or Windows In – Hadoop
Compute
context R script
– sets where the
model will run
Functional
model R script –
does not need
to change to run
in Hadoop

EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'x <- as.matrix(InputDataSet);
y <- array(dim1:dim2);
OutputDataSet <- as.data.frame(x %*% y);'
, @input_data_1 = N'SELECT [Col1] from MyData;’
, @params = N'@dim1 int, @dim2 int’
, @dim1 = 12, @dim2 = 15
WITH RESULT SETS (([Col1] int, [Col2] int, [Col3] int, [Col4] int));

launchpad.exe
sp_execute_external_script
sqlservr.exe
Named pipe
Each SQL
instance has a
launchpad
SQLOS
XEvent
MSSQLSERVER Service MSSQLLAUNCHPAD Service
“What” and “How”
to “launch”
“launcher”
Windows
“satellite” process
sqlsatellite.dll
Windows
“satellite” process
Windows
“satellite” process
Windows
“satellite” process
Windows
“satellite” process

SQL Server 2017 Machine Learning Services

What's hot

Fully fault tolerant real time data pipeline with docker and mesos Rahul Kumar

Extending Windows Admin Center to manage your applications and infrastructure...Microsoft Tech Community

Building a Real-Time Data Pipeline with Spark, Kafka, and PythonSingleStore

A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkRed Hat Developers

KSQL - Stream Processing simplified!Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz

RHTE2015_CloudForms_ContainersJerome Marc

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis

Unlock cassandra data for application developers using graphQLCédrick Lunven

Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesSingleStore

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro sessionTravis Wright

DevOps for Big Data - Data 360 2014 ConferenceGrid Dynamics

Legacy Migration OverviewBambordé Baldé

Big Data Tools in AWSShu-Jeng Hsieh

Introducing Cloud Development with MantlCisco DevNet

Streaming Data from Scylla to KafkaScyllaDB

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent

Monoliths to the cloud!Luciano Mammino

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema

How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksAmazon Web Services

What's hot (20)

Fully fault tolerant real time data pipeline with docker and mesos

Extending Windows Admin Center to manage your applications and infrastructure...

Building a Real-Time Data Pipeline with Spark, Kafka, and Python

A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk

KSQL - Stream Processing simplified!

30 Minutes to the Analytics Platform with Infrastructure as Code

RHTE2015_CloudForms_Containers

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Unlock cassandra data for application developers using graphQL

Real-Time Data Pipelines with Kafka, Spark, and Operational Databases

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session

DevOps for Big Data - Data 360 2014 Conference

Legacy Migration Overview

Big Data Tools in AWS

Introducing Cloud Development with Mantl

Streaming Data from Scylla to Kafka

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...

Monoliths to the cloud!

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...

How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks

Similar to SQL Server 2017 Machine Learning Services

Microsoft R - Data Science at ScaleSascha Dittmann

TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta

TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta

Microsoft R Server for Data ScienceaData Science Thailand

Microsoft R - ScaleR OverviewKhalid Salama

Parallelizing Existing R PackagesCraig Warman

Deathstararmstrtw

R the unsung hero of Big DataDhafer Malouche

Introduction to cloudforecastMasahiro Nagano

SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit

MLflow with RDatabricks

6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi

Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao

Osd ctw sparkWisely chen

r,rstats,r language,r packagesAjay Ohri

Import web resources using R StudioRupak Roy

Unit 2vishal choudhary

Flux - Open Machine Learning Stack / PipelineJan Wiegelmann

MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13

Get started with R langsenthil0809

Similar to SQL Server 2017 Machine Learning Services (20)

Microsoft R - Data Science at Scale

TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...

TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...

Microsoft R Server for Data Sciencea

Microsoft R - ScaleR Overview

Parallelizing Existing R Packages

Deathstar

R the unsung hero of Big Data

Introduction to cloudforecast

SparkR: Enabling Interactive Data Science at Scale on Hadoop

MLflow with R

6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...

Introduction to Data Mining with R and Data Import/Export in R

Osd ctw spark

r,rstats,r language,r packages

Import web resources using R Studio

Unit 2

Flux - Open Machine Learning Stack / Pipeline

MAP REDUCE IN DATA SCIENCE.pptx

Get started with R lang

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Slack Application Development 101 Slidespraypatel2

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Real Time Object Detection Using Open CVKhem

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

GenCyber Cyber Security Day PresentationMichael W. Hawkins

A Year of the Servo Reboot: Where Are We Now?Igalia

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Slack Application Development 101 Slides

Handwritten Text Recognition for manuscripts and early printed texts

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Data Cloud, More than a CDP by Matt Robison

08448380779 Call Girls In Civil Lines Women Seeking Men

Real Time Object Detection Using Open CV

Axa Assurance Maroc - Insurer Innovation Award 2024

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

GenCyber Cyber Security Day Presentation

A Year of the Servo Reboot: Where Are We Now?

Automating Google Workspace (GWS) & more with Apps Script

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Presentation on how to chat with PDF using ChatGPT code interpreter

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

SQL Server 2017 Machine Learning Services

3. Familiar Scalable Secure

6. SQL Server ML Services Linux Hadoop Teradata Windows CommercialCommunity R ServerR Open

8. Installed Packages Base - stats - graphics - grDevices - utils - datasets - methods - base Recommended - boot - class - cluster - codetools - foreign - kernSmooth - lattice - MASS - Matrix - mgcv - nlme - nnet - rpart - spatial - survival Microsoft (Developed / Maintained) - checkpoint - deployRserve - doParallel - foreach - jsonlite - iterators - microsoftR - RevoIOQ - RevoMods - RevoUtils - RODBC - RevoUtilsMath - azureml - rmr2 - rhdfs - rhbase - plyrmr Open-Source #1 Additional CRAN R - curl - jsonlite - png - R6 - RODBC Microsoft R Open #2 (Intel MKL) Microsoft R Server #4 Microsoft R Client (free) #3 Microsoft (Developed / Maintained) - RevoScaleR - MicrosoftML - CompatibilityAPI - mrupdate - RevoIOQ - RevoTreeView - Mrsdeploy - Sqlrutils - olapR Commercially licenced & supported Open-Source Open-Source

10.

11.

12. Iterate/ Sequence

13.

14.

15.

16.

17. Selecting Features

18. Selecting Features

19. New Services

20. Accepting License Agreements

21.

22. Linux, Windows, Hadoop & Teradata R Server Technology

23. Algorithm Master Predictive Algorithm Big Data Analyze Blocks In Parallel Load Block At A Time Distribute Work, Compile Results “Pack and Ship” Requests to Remote Environments Results Copyright Microsoft Corporation. All rights reserved. Microsoft R Server “Client” Microsoft R Server “Server” Console R IDE or command- line REMOTE CONTEXT

24. DI R+CRAN MicrosoftR DistributedR DeployR DevelopR ScaleR ConnectR • Cloudera • Hortonworks • MapR • Apache Spark • IBM Platform LSF • Microsoft HPC Clusters • SQL Server • Teradata Database • Red Hat • SuSE Servers • Windows DistributeR

25. ### SETUP HADOOP ENVIRONMENT VARIABLES ### myHadoopCC <- RxHadoopMR() ### HADOOP COMPUTE CONTEXT ### rxSetComputeContext(myHadoopCC) ### CREATE HDFS, DIRECTORY AND FILE OBJECTS ### hdfsFS <- RxHdfsFileSystem() hdfsFS ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(hdfsXdfArrLateLinMod$coefficients) ### SETUP LOCAL ENVIRONMENT VARIABLES ### myLocalCC <- “localpar” ### LOCAL COMPUTE CONTEXT ### rxSetComputeContext(myLocalCC) ### CREATE LINUX, DIRECTORY AND FILE OBJECTS ### localFS <- RxNativeFileSystem() AirlineDataSet <- RxXdfData(“AirlineDemoSmall.xdf”, fileSystem = localFS) Local Parallel processing – Linux or Windows In – Hadoop Compute context R script – sets where the model will run Functional model R script – does not need to change to run in Hadoop

26.

27.

28.

29.

30. EXECUTE sp_execute_external_script @language = N'R' , @script = N'x <- as.matrix(InputDataSet); y <- array(dim1:dim2); OutputDataSet <- as.data.frame(x %*% y);' , @input_data_1 = N'SELECT [Col1] from MyData;’ , @params = N'@dim1 int, @dim2 int’ , @dim1 = 12, @dim2 = 15 WITH RESULT SETS (([Col1] int, [Col2] int, [Col3] int, [Col4] int));

31. launchpad.exe sp_execute_external_script sqlservr.exe Named pipe Each SQL instance has a launchpad SQLOS XEvent MSSQLSERVER Service MSSQLLAUNCHPAD Service “What” and “How” to “launch” “launcher” Windows “satellite” process sqlsatellite.dll Windows “satellite” process Windows “satellite” process Windows “satellite” process Windows “satellite” process

Editor's Notes

Slide Objective: Show the three pillars of Microsoft Advanced Analytics Talking Points: Microsoft’s Advanced Analytics products work with all your current investments – we support different platforms like Windows, Linux, SQL, Terada and even Big data. It works both on premise and in the cloud Microsoft has for long been investing in innovative Artificial Intelligence technologies and baking them into our products like Cortana, HoloLens, Bing and Skype. We are now commercializing these technologies through our advanced analytics products including Microsoft R. Microsoft want to help you accelerate the process of generating value from your data – which is why we are not only building the tools but investing heavily in creating solutions that can help you drive value.
What does R mean in “R services”. R is a statistical computing programming language based on an Open Source Standard, R Open.
Last but not least, customers need flexibility when it comes to the choice of platform, programming languages & data infrastructure to get from the most from their data. Why? In most IT environments, platforms, technologies and skills are as diverse as they have ever been, the data platform of the future needs to you to build intelligent applications on any data, any platform, any language on premises and in the cloud. SQL Server manages your data, across platforms, with any skills, on-premises & cloud Our goal is to meet you where you are with on any platform, anywhere with the tools and languages of your choice. SQL now has support for Windows, Linux & Docker Containers. It allows you to leverage the language of your choice for advanced analytics – R & Python.
Slide objective Show broad commitment to R by preserving freely available, enhanced editions, Windows and SQL Server editions and R Server editions for leading EDWs, Linux and Hadoop platforms. Differentiate free, open editions from commercial by mentioning availability of commercial 24x7 support, and enhancements to support very large scale data analytics at speed. Talking points Notes
Slide objective Illustrate the potential scale benefits possible with R Server’s ScaleR algorithms. Show a representative example and explain the 3 mechanisms that help achieve the improvements. . Talking points We tested the improved data and computational scale of the R Server’s ScaleR library of enhanced, parallelized algorithms. This is an example. Speed: On a 4 core laptop, with 8GB of RAM, open source R could process about 300,000 events in a particular data set prior to exhausting available memory. The test tool about 77 seconds to run the most commonly used R linear regression algorithm called “lm”. We than ran the same test using our parallelized, rewritten (in C++) linear regression module called rxLinMod. Data Scale Algorithms in the ScaleR library are also rewritten to analyze data in “chunks” to eliminate the memory-limits of typical open source R algorithms. Where the open source lm exhausted memory at about 300,000 events, the improved rxLinMod was working fine at 5 million events where we stopped testing. The result is a 50x performance improvement over open source linear regression, and no memory limits. Parallel Scale This example shows only the effects of running optimized, compoiled code on all cores of a laptop. Greater benefits are available. What is not shown, is that the work done to parallelize across 4 cores can also be utilized to scale across many nodes in systems such as EDWs and Hadoop. While results vary, the system, as you can see, responds linearly with respect to data size. Rehosting using R Server for Hadoop can provide even more dramatic speed and scale results. Notes
- Wrangle data, experiment with models, and test models from a workstation - Use your favorite IDE or notebook service
- Train models on big data, at speed, in parallel - Transform large data sets using T-SQL, R, and Python - Repeatedly score and rescore large data assets
- Embed R or Python in T-SQL - Execute using T-SQL BI, reporting & app dev tools
- Embed R and Python within T-SQL scripts - Makes R & Python callable from traditional applications - Deploy smart apps using existing skills & tools
- Run trained models in real-time with low latency - Detect anomalies at speed
Microsoft R Server is a broadly deployable enterprise-class analytics platform based on R that is supported, scalable and secure. Supporting a variety of big data statistics, predictive modeling and machine learning capabilities, R Server supports the full range of analytics – exploration, analysis, visualization and modeling Slide objective Introduce the high –level value of R Server and R Services over other instantiations of the R language. Talking points R Server products provide an enhanced experience for the R User without loss of compatibility R Server products are “open core” – the utilize the open source R product entirely and build new capabilities around that core without impacting compatibility. Users of R Server products enjoy full compatibility with open source compatible with the entire (and vast) collection of algorithms, connectors, visualization tools shared openly via CRAN, Bioconductor and other shared resources like GitHub. Key extensions enable R to tackle big data challenges that exceed the capacity of open source R. R Scripts built for one platform using R Server can be easily run on another platform running R Server We call it WODA – write once, deploy anywhere. Two key contributions: Build on any version of the product and deploy using other versions Investment protection as platform choices change Develop on the desktop and immediately deploy to RDBMS – SQL Server, EDW (SQL Server & Teradsata) or Hadoop (Microsoft, Cloudera, Hortonworks and MapR) Notes
Slide Objective Present the range of already parallelized functions and algorithms available with RevoScaleR Talking Points This list shows the functions and algorithms that are available with all versions of R Server. We call this the ScaleR Library. Each function can: Execute work in step in parallel or serial as needed Process work using multiple threads, cores, sockets or nodes Process one or more data block in each thread, core, socket or node Combine the results into a single mathematically correct answer Do the work either locally or ship the request to another system for completion remotely. Completely obscure the complexity of parallelization, multiple steps and iterations from the R programmer Four functions, rxDataStep, rxExec, PEMA-R, and the newest rxExecBy provide frameworks for users to write their own routines – functions – algorithms using parallelization. While more difficult than pre-written PEMAs, the results are portable – usable on multiple systems Easier than writing directly to the platform to create custom algorithms. Notes: One algorithm framework – PEMA-R API, is not available in clustered systems – an exception to creating portability across systems.
sp_execute_external_script is an example of a special proc or specproc. The source code of the procedure can’t be found in the resource db. It is implemented in our source code
Talk about other “extensible” environments we have used in the past xproc sp_OA Linked servers Full-text
Follow the instructions in insidesqlr\readme.txt

SQL Server 2017 Machine Learning Services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SQL Server 2017 Machine Learning Services

Similar to SQL Server 2017 Machine Learning Services (20)

Recently uploaded

Recently uploaded (20)

SQL Server 2017 Machine Learning Services

Editor's Notes