Microsoft R Server for Data Sciencea

Data
Science Team
Data
Engineering
Data
Science
Application
Development
Business
Acumen
Data
Management
Data
Dividend

Typical advanced analytics lifecycle
Ingest Transform Explore Model Deploy    
Score Visualize Measure  
Model
Score
ƒ(x)
Preparation Modeling
Operationalization

Data Scientist should be creating / testing models
Data scientist are rare and expensive
Model
Score
ƒ(x)
Operationalization

But the reality is different …
Data scientist focus time
Model
Score
ƒ(x)
Operationalization
80%
5%
15%

Decisions
Operationize
Preparation
Model

• Embrace Open Source
• Evolutionary Path to Cloud
• Democratize Data Science
• Skill Re-Use
• Transparent Scaling
• Facilitate Collaboration
• Decouple Data Science from Platforms
• Leverage Hybrid Cloud Architecture
• Accelerate Experimentation
• Streamline Deployment
Broaden The
Talent Pool
Increase
Productivity
Modernize
Infrastructure
Maximize
Innovation
Drive Down
TCO

People
+
Data
Sources
Apps
Sensors
and
devices
From Data To Action On Premises
INTELLIGENCEDATA ACTION
Automated
SystemsMicrosoft R Server & SQL R Services
Apps
Cortana Intelligence

Challenges posed by open source R
?
?
Lack of
Commercial
Support
Inadequate
Modeling
Performance
Complex
Deployment
Processes
Limited
Data
Scale

R from Microsoft brings
Peace of
mind
Efficiency Speed and
scalability
Flexibility
and agility

High-performance, Scalable R
Linux, Windows, Hadoop & Teradata
R Server Technology

CommercialOpen Community
Revolution R Open
R Open
Revolution R Enterprise
R Server

Escapes R’s traditional memory limits
Scales predictive modeling using
parallelization
Distributes computation cores & nodes
Minimizes data movement using in-
database, in-MapReduce and in-Apache
Spark execution

• Remote Execution
• Transparent
Parallelization:
• Shared Resource
Management
Data
Nodes
Corporate
Applications
Desktops &
Servers
direct web services
Microsoft R
Server
Hadoop

Distributed R - How Does Remote Compute Context ?
Algorithm
Master
Predictive
Algorithm
Big
Data
Analyze
Blocks In
Parallel
Load Block
At A Time
Distribute Work,
Compile Results
“Pack and Ship”
Requests to
Remote
Environments
Results
Microsoft R Server functions
• A compute context defines where to process.
• E.g. remote context like Hadoop Map Reduce
• Microsoft R functions prefixed with rx
• Current set compute context determines processing
location
Copyright Microsoft Corporation. All rights reserved.
Microsoft R Server “Client” Microsoft R Server “Server”
Console
R IDE or
command-
line REMOTE
CONTEXT

### SETUP HADOOP ENVIRONMENT VARIABLES ###
myHadoopCC <- RxHadoopMR()
### HADOOP COMPUTE CONTEXT ###
rxSetComputeContext(myHadoopCC)
### CREATE HDFS, DIRECTORY AND FILE OBJECTS ###
hdfsFS <- RxHdfsFileSystem()
hdfsFS
### ANALYTICAL PROCESSING ###
### Statistical Summary of the data
rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)
### CrossTab the data
rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)
### Linear Model and plot
hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet)
plot(hdfsXdfArrLateLinMod$coefficients)
### SETUP LOCAL ENVIRONMENT VARIABLES ###
myLocalCC <- “localpar”
### LOCAL COMPUTE CONTEXT ###
rxSetComputeContext(myLocalCC)
### CREATE LINUX, DIRECTORY AND FILE OBJECTS ###
localFS <- RxNativeFileSystem()
AirlineDataSet <- RxXdfData(“AirlineDemoSmall.xdf”,
fileSystem = localFS)
Local Parallel processing – Linux or Windows In – Hadoop
ScaleR models can be deployed from a server or edge node to run in Hadoop
without any functional R model re-coding for map-reduce
Compute
context R script
– sets where the
model will run
Functional
model R script –
does not need
to change to run
in Hadoop
Copyright Microsoft Corporation. All rights reserved.

DeployR
• Web services software development kit for
integration analytics via APIs :
• Java
• JavaScript
• .NET Integrates R Into application
infrastructures
Capabilities:
• Enterprise authentication & security
• Horizontal scaling
• Invokes R Scripts from web services calls
• RESTful interface for easy integration
• Works with:
• Web & mobile apps
• Leading BI & Visualization tools
• Business rules and streaming engines
DeployR DevelopR

19
On-demand sales forecasting
Real-time social
media analysisLeveraging the
power of Office365

Microsoft R Server provides a unique opportunity to deliver advanced analytics
capabilities to customers who have already invested in storing their data on non
Microsoft platforms like Hadoop, Teradata and Linux
Hadoop
- Cloudera CDH, Hortonworks HDP, and HDInsight

Write Once – Deploy Anywhere
R Server portfolio
Cloud
RDBMS
Desktops & Servers
Hadoop & Spark
EDW
R Server Technology

Included in SQL Server
2016
Reuse and optimize
existing R code
Eliminate data movement
In-database deployment
Memory and disk
scalability
No R memory limits
Write once, deploy
anywhere
Enterprise speed and
scale
Near-DB analytics
Parallel threading and
processing
Reuse SQL skills for data
engineering
Cost
effectiveness
Scalability
and choice
Simplicity
and agility

• The industry’s broadest R-based platform
• Enterprise scale atop spark, Hadoop, RDBMSs & EDWs
• Freedom from memory limits
• Choice of Windows and Linux IDEs
• Stable deployment
• Write-once-deploy-anywhere portability
• Investment protection
• Hybrid cloud evolution

Introduces the following topics:
1. Creating an R Server on Spark HDInsight cluster
2. Installing RStudio for the cluster
3. Running R using Rstudio on web
Reference: https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-hadoop-r-server-get-
started/

Get Essentials Microsoft Developer Resources
and R Server Developer Edition: aka.ms/ch9.th
Microsoft R Server on-premises:
www.microsoft.com/R-Server
Microsoft R Server on Azure (Cloud):
https://azure.microsoft.com/en-
us/marketplace/partners/microsoft-r-
products/microsoft-r-server/

What is
• A statistics programming language
• A data visualization tool
• Open source
• 2.5+M users
• Taught in most universities
• Thriving user groups worldwide
• 7000+ free algorithms in CRAN
• Scalable to big data
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration

Convergence with Flexibility
Scalable Algorithms
R: Write Once Deploy Anywhere
Templates & Samples
Microsoft R Server Family
R & Python to AML Interop.
Cortana Intelligence

DistributedR
ScaleR
ConnectR
DevelopR
Code Portability Across Platforms
In the Cloud Azure HDI/ Spark
Workstations & Servers Linux
Windows
Clustered Systems
Linux Clusters (LSF For Now)
Microsoft HPC
EDW Teradata
Hadoop
Hortonworks
Cloudera
MapR &HDInsight

DI
R+CRAN
MicrosoftR
DistributedR
DeployR DevelopR
ScaleR
ConnectR
Delivers High Performance Parallel Distributed
Analytics Across Individual and Clustered Systems
• Cloudera
• Hortonworks
• MapR
• Apache Spark
• IBM Platform LSF
• Microsoft HPC
Clusters
• Teradata
Database
• Red Hat
• SuSE Servers
• Windows
DistributeR

RevoDeployR Web Services
Client libraries (JavaScript, Java, .NET)
Desktop
Applications
(i.e. Excel)
Business
Intelligence
PowerBI
Interactive Web or
Mobile
Applications
HTTP/HTTPS – JSON/XML
Session
Management
Authentication
Data/Script
Management
Administration
R
R
R scripts
End User
Application
Developer
Admin
Data Scientist
Grid Node
R

Microsoft R Server for Data Sciencea

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Microsoft R Server for Data Sciencea

Similar to Microsoft R Server for Data Sciencea (20)

More from Data Science Thailand

More from Data Science Thailand (13)

Recently uploaded

Recently uploaded (20)

Microsoft R Server for Data Sciencea