R is rapidly becoming the leading language in Data Science and statistics.
This session will show how Microsoft SQL Server can help meet an increasingly “predictive” world by supporting the R language inside the database.
Demonstration using R and SQL Server Services in rental industry.
2. Speaker
• Fisnik Doko
Microsoft Certified Trainer
• 21 active certificates - Microsoft
• MCSD: App Builder | Web applications
• MCSE: Data Management and Analytics | Data Platform
• MCSE: Cloud Platform and Infrastructure
• Software Architect
• Consultant
• Speaker
3. Content
• Advanced Analytics Introduction
• What is R?
• What is Microsoft R?
• Microsoft SQL Server R Services
• R for machine learning
• Demo
• R and Python
6. Typical Predictive Analytics Process
• Prepare: Assemble,
cleanse, profile and
transform diverse data
relevant to the subject
OperationalizeModelPrepare
SQL Query Data Science R in Database
7. What is
• A statistics programming language
• A data visualization tool
• Open source
• 2.5+M users
• Taught in most universities
• Thriving user groups worldwide
• 15 000+ free algorithms in CRAN
• Machine Learning includes 400+ R packages
• Scalable to big data
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration
R
10. Microsoft R Client
• Freely available and based on Microsoft Open R
• Run locally
• Can install any open source R packages
• Limited to two threads
• Datasets must fit in memory
• Chunking data is not available
• Can interact with R Server
11. What is Microsoft R Server?
• Renamed to Microsoft Machine Learning Server (SQL 2017)
• Added support for the full data science lifecycle of Python
• Multithreaded Performance, parallelization, and distributed
• RevoScaleR package machine learning, supports data
science at scale
• MicrosoftML package for distributed machine learning
• Operationalization functions for deploying to remote servers
12. What is SQL Server R Services?
• An implementation of Microsoft R Server, optimized for SQL
Server
• Intended to run R code stored within the database
• Supports enterprise-scale data science
• Helps you embrace the highly popular open source R
language in your business.
• R processes execute outside of the database engine
• Security is handled by SQL Server Trusted Launchpad
13. Set up SQL Server R Services (In-Database)
• Step 1: Install R Services (In-Database) on SQL Server 2017
• Step 2: Enable R Service
• Step 3: Launchpad Service
17. Running R code from SQL Server
• Run R code from SQL Server using the
sp_execute_external_script stored procedure
• You can:
• Run arbitrary R code
• Provide input parameters that can be referenced by the R code
• Specify an input dataset
• Return an output dataset, plot od model
18. • New stored procedure
EXEC sp_execute_external_script
@language = N’R’,
@script = N’[R code goes here]’,
@input_data_1 = N’[SQL input]’
[ , @input_data_1_name = N‘InputDataSet’ ]
[ , @output_data_1_name = N’OutputDataSet’ ]
[ , @params = N’parameter’ ]
WITH RESULT SETS (([SQL output]));
input_data_1_name and
output_data_1_name are optional
and default to InputDataSet and
OutputDataSet respectively
Operationalized R
19. 1. Transform Data
2. Evaluate data
3. Build model
4. Save model to
stored proc.
SQL
Server
2017
Web
App
Deploy
20. Powerful R Capabilities SQL Server
• Meeting the Needs of R and SQL Users With One Platform
• R users can:
• Load, transform, visualize, learn from data assets in SQL
• Create or “train” predictive models
• Scale R analytics to big data using SQL Server R Services
• Connect to SQL from R Tools for Visual Studio or third party IDEs (R Studio)
• Deploy and operationalize applications that use these predictions
• SQL Users can:
• Embed R to access predictive analytics from SQL
• Run R scripts and Modeling algorithms from T-SQL scripts and within stored
procedures
• Extend R capabilities to data engineers and application developers
• Easily embed prediction into BI and custom applications
21. Demonstration
• Build a predictive model using R and SQL Server ML
Services
• Ski rental business - predict the number of rentals
that we will have on a future date
22. Why use R for machine learning?
• Statistical analysis comprises three common tasks:
• Data transformation
• Data visualization
• Data modeling
• R provides an array of packages to help you perform these tasks
• R also provides programming constructs to build a workflow of
operations
• R is interactive; you can quickly prototype your operations
• R packages can be written using compiled languages, for speed
• View the results using the Visualize command on the output ports
23. Why use Python for machine learning?
• Fully-fledged programming language
• Portable, and runs on many different operating systems
• Frequently used to provide the glue to integrate components
developed in different languages
• Excellent for transforming data between formats
• More complex than R; it supports advanced OO features
• Packages developed in other compiled languages can be
easily incorporated
24. Selecting the appropriate language
• R is favored by data scientists because it expresses statistical
concepts concisely
• R is favored by programmers because it is more general
purpose and powerful
• R has a broader range of statistical packages available
• Python has a more consistent syntax
• Both languages can interoperate with each other