SlideShare a Scribd company logo
1 of 44
The Microsoft
Data Platform
MobileReports
Natural
language
queryDashboardsApplications
StreamingRelational
Internal &
externalNon-relational NoSQL
Orchestration
Machine
learningModeling
Information
management
Complex event
processing
Data
The Microsoft
Data Platform
MobileReports
Natural
language
queryDashboardsApplications
StreamingRelational
Internal &
externalNon-relational NoSQL
Orchestration
Machine
learningModeling
Information
management
Complex event
processing
Data
What
happened?
Why did
it happen?
What will
happen?
How can we
make it happen?
Traditional BI Advanced Analytics
• A Language Platform…
• A Community…
• An Ecosystem
Download the White Paper
R is Hot
bit.ly/r-is-hot
blog.revolutionanalytics.com/popularity
R Usage Growth
Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity
IEEE Spectrum Top Programming Languages
•
•
Who We Are
Only provider of commercial big data big analytics platform based on
open source R statistical computing language
Our Software Delivers
Scalable Performance: Distributed & parallelized analytics
Cross Platform: Write once, deploy anywhere
Productivity: Easily build & deploy with latest modern analytics
Our Services Deliver
Knowledge: Our experts enable you to be experts
Time-to-Value: Our Quickstart program gives you a jumpstart
Guidance: Our customer support team is here to help you
Global Industries Served
Financial Services
Digital Media
Government
Health & Life Sciences
High Tech
Manufacturing
Retail
Telco
Customers
300+ Global 2000
Global Presence
North America / EMEA / APAC
Our Vision:
• R becomes the de-facto
standard for enterprise
predictive analytics
Our Mission:
• Drive enterprise adoption
of R by providing enhanced
R products tailored to meet
enterprise challenges
Support & Services
• Commercial Support Programs
• Training Programs
• Professional Services
Community Programs
• Academic Support Programs
• Contributions to Open Source R
• Open Source Extensions
• Sponsorship of R User Groups
Software Products
• Stable Distributions
• Broad Platform Support
• Big Data Analytics
• Application Integration
• Deployment Integration
• Agile Development Tooling
• Future Platform Support
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Revolution R Open
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution Analytics
Revolution R Enterprise
• Simplicity
• Speed
• Capability
• Speed
• Scale
• Stability
• Time-to-Results
• Compatibility
 Naïve Bayes
 Data import – Delimited, Fixed, SAS, SPSS,
OBDC
 Variable creation & transformation
 Recode variables
 Factor variables
 Missing value handling
 Sort, Merge, Split
 Aggregate by category (means, sums)
 Min / Max, Mean, Median (approx.)
 Quantiles (approx.)
 Standard Deviation
 Variance
 Correlation
 Covariance
 Sum of Squares (cross product matrix for set
variables)
 Pairwise Cross tabs
 Risk Ratio & Odds Ratio
 Cross-Tabulation of Data (standard tables & long
form)
 Marginal Summaries of Cross Tabulations
 Chi Square Test
 Kendall Rank Correlation
 Fisher’s Exact Test
 Student’s t-Test
 Subsample (observations & variables)
 Random Sampling
Data Step Statistical Tests
Sampling
Descriptive Statistics
 Sum of Squares (cross product matrix for set
variables)
 Multiple Linear Regression
 Generalized Linear Models (GLM) exponential
family distributions: binomial, Gaussian, inverse
Gaussian, Poisson, Tweedie. Standard link
functions: cauchit, identity, log, logit, probit. User
defined distributions & link functions.
 Covariance & Correlation Matrices
 Logistic Regression
 Classification & Regression Trees
 Predictions/scoring for models
 Residuals for all models
Predictive Models  K-Means
 Decision Trees
 Decision Forests
 Gradient Boosted Decision Trees
Cluster Analysis
Classification
Simulation
Variable Selection
 Stepwise Regression
 Simulation (e.g. Monte Carlo)
 Parallel Random Number Generation
Combination
New in
v7.3
 PEMA-R API
 rxDataStep
 rxExec
Coming
in v7.4
File Name
Compressed
File Size (MB) No. Rows
Open
Source R
(secs)
Revolution R
(secs)
Tiny 0.3 1,235 0.00 0.05
V. Small 0.4 12,353 0.21 0.05
Small 1.3 123,534 0.03 0.03
Medium 10.7 1,235,349 1.94 0.08
Large 104.5 12,353,496 60.69 0.42
Big (full) 12,960.0 123,534,969 Memory! 4.89
V.Big 25,919.7 247,069,938 Memory! 9.49
Huge 51,840.2 494,139,876 Memory! 18.92
 Public US Flight Data
 Linear Regression on Arrival Delay
 Run on 4 core laptop, 16GB RAM and 500GB SSD
19
1. Starts A Master Process
2. Distribute Work
1. Threading? Cores? Sockets? Nodes?
2. Available RAM?
3. Location of Data?
3. Master Tasks for Each Node
4. Master Initiates Distributed Work
1. Hadoop Schedules Mapper for Each Split
2. Algorithm Computes Intermediate Result
3. Reducer Combines Intermediate Results
5. Master Process Evaluates Completion
6. Returns Consolidated Answer to Script
In Revolution R Enterprise:
Task
Task
Task
Finalizer
Initiator
 … load a large dataset into Hadoop
rxSetComputeContext(RxHadoopMR(…)
Model_obj <- rxLinMod(…)
 … use model object to predict…
Big
DataStub
Remote
Predictive
Algorithms
Move
Logic To
The Data
Task
Task
Task
Finalizer
Initiator
rows
minutes
R on a
server
pulling data
via SQL
R on a server
Invoking RRE
ScaleR Inside
the EDW
Generalised Linear Model
• 150 million observations
• 70 degrees of freedom
So what have we learned:
• SAS works, but is slow.
• The data is too big for open-source R, even on a
very large server.
• Hadoop is not a right fit
• Revolution R Enterprise gets the same results as
SAS, but about 50x faster.
Software Platform Time to fit
SAS 16-core Sun Server 5 hours
rmr / map-reduce 10-node (8 cores /
node) Hadoop cluster
> 10 hours
Open source R 250-GB Server Impossible (>
3 days)
RevoScaleR 5-node (4 cores /
node) LSF cluster
5.7 minutes
R+CRAN
RevolutionR
DistributedR
DeployR DevelopR
ScaleR
ConnectR
Delivers High Performance Parallel Distributed
Analytics Across Individual and Clustered Systems
• Cloudera
• Hortonworks
• MapR
• Apache Spark
(coming soon)
• IBM Platform LSF
• Microsoft HPC
Clusters
• Teradata
Database
• Red Hat
• SuSE Servers
• Windows
Task
Task
Task
Finalizer
Initiator
Features
•
•
•
Data Sources
•
•
•
• Compressed
• Chunkwise-columnar storage
• Optimized for R:
• Enables 20+x speedups
29
Task
Task
Task
Finalizer
Initiator
Features
•
•
Data Sources
•
Task
Task
Task
Finalizer
Initiator
Features
•
•
Data Sources
•
•
•
Task
Task
Task
Finalizer
Initiator
• HDFS data source
• Local “share” directory
• HDFS “share” directory
Share directories must be writable by “you” from both
the edge node and the data node.
• Distributed shell
• Distributed copy
• Distributed and install R packages
• Deploy/Remove/Update Revolution R Enterprise
• Add/Remove users from nodes
Features
•
•
•
Data Sources
•
Inside Hadoop Beside Hadoop
• ScaleR on Spark
• ScaleR on SQL Server
Coming Soon…
42
Cortana Analytics Workshop: On-Premises Hadoop and Revolution R -- Architectures and Solutions
Cortana Analytics Workshop: On-Premises Hadoop and Revolution R -- Architectures and Solutions

More Related Content

More from MSAdvAnalytics

Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogMSAdvAnalytics
 
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...MSAdvAnalytics
 
Cortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
Cortana Analytics Workshop: Real-World Data Collection for Cortana AnalyticsCortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
Cortana Analytics Workshop: Real-World Data Collection for Cortana AnalyticsMSAdvAnalytics
 
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...MSAdvAnalytics
 
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & ComplianceCortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & ComplianceMSAdvAnalytics
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BIMSAdvAnalytics
 
Cortana Analytics Workshop: Milliman Integrate for Cortana Analytics
Cortana Analytics Workshop: Milliman Integrate for Cortana AnalyticsCortana Analytics Workshop: Milliman Integrate for Cortana Analytics
Cortana Analytics Workshop: Milliman Integrate for Cortana AnalyticsMSAdvAnalytics
 
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning ApproachCortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning ApproachMSAdvAnalytics
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Cortana Analytics Workshop: Using the Cortana Analytics Process
Cortana Analytics Workshop: Using the Cortana Analytics ProcessCortana Analytics Workshop: Using the Cortana Analytics Process
Cortana Analytics Workshop: Using the Cortana Analytics ProcessMSAdvAnalytics
 
Cortana Analytics Workshop: Building Next-Generation Smart Grids
Cortana Analytics Workshop: Building Next-Generation Smart GridsCortana Analytics Workshop: Building Next-Generation Smart Grids
Cortana Analytics Workshop: Building Next-Generation Smart GridsMSAdvAnalytics
 
Cortana Analytics Workshop: Deep Neural Networks
Cortana Analytics Workshop: Deep Neural NetworksCortana Analytics Workshop: Deep Neural Networks
Cortana Analytics Workshop: Deep Neural NetworksMSAdvAnalytics
 
Cortana Analytics Workshop: AI -- Assistive Intelligence
Cortana Analytics Workshop: AI -- Assistive IntelligenceCortana Analytics Workshop: AI -- Assistive Intelligence
Cortana Analytics Workshop: AI -- Assistive IntelligenceMSAdvAnalytics
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftMSAdvAnalytics
 
Cortana Analytics Workshop: Power BI 2.0
Cortana Analytics Workshop: Power BI 2.0Cortana Analytics Workshop: Power BI 2.0
Cortana Analytics Workshop: Power BI 2.0MSAdvAnalytics
 
Cortana Analytics Workshop: Demystifying Cortana Analytics
Cortana Analytics Workshop: Demystifying Cortana AnalyticsCortana Analytics Workshop: Demystifying Cortana Analytics
Cortana Analytics Workshop: Demystifying Cortana AnalyticsMSAdvAnalytics
 
Cortana Analytics Workshop: The Future of Analytics
Cortana Analytics Workshop: The Future of AnalyticsCortana Analytics Workshop: The Future of Analytics
Cortana Analytics Workshop: The Future of AnalyticsMSAdvAnalytics
 

More from MSAdvAnalytics (19)

Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
 
Cortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
Cortana Analytics Workshop: Real-World Data Collection for Cortana AnalyticsCortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
Cortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
 
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
 
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & ComplianceCortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BI
 
Cortana Analytics Workshop: Milliman Integrate for Cortana Analytics
Cortana Analytics Workshop: Milliman Integrate for Cortana AnalyticsCortana Analytics Workshop: Milliman Integrate for Cortana Analytics
Cortana Analytics Workshop: Milliman Integrate for Cortana Analytics
 
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning ApproachCortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Cortana Analytics Workshop: Using the Cortana Analytics Process
Cortana Analytics Workshop: Using the Cortana Analytics ProcessCortana Analytics Workshop: Using the Cortana Analytics Process
Cortana Analytics Workshop: Using the Cortana Analytics Process
 
Cortana Analytics Workshop: Building Next-Generation Smart Grids
Cortana Analytics Workshop: Building Next-Generation Smart GridsCortana Analytics Workshop: Building Next-Generation Smart Grids
Cortana Analytics Workshop: Building Next-Generation Smart Grids
 
Cortana Analytics Workshop: Deep Neural Networks
Cortana Analytics Workshop: Deep Neural NetworksCortana Analytics Workshop: Deep Neural Networks
Cortana Analytics Workshop: Deep Neural Networks
 
Cortana Analytics Workshop: AI -- Assistive Intelligence
Cortana Analytics Workshop: AI -- Assistive IntelligenceCortana Analytics Workshop: AI -- Assistive Intelligence
Cortana Analytics Workshop: AI -- Assistive Intelligence
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ Microsoft
 
Cortana Analytics Workshop: Power BI 2.0
Cortana Analytics Workshop: Power BI 2.0Cortana Analytics Workshop: Power BI 2.0
Cortana Analytics Workshop: Power BI 2.0
 
Cortana Analytics Workshop: Demystifying Cortana Analytics
Cortana Analytics Workshop: Demystifying Cortana AnalyticsCortana Analytics Workshop: Demystifying Cortana Analytics
Cortana Analytics Workshop: Demystifying Cortana Analytics
 
Cortana Analytics Workshop: The Future of Analytics
Cortana Analytics Workshop: The Future of AnalyticsCortana Analytics Workshop: The Future of Analytics
Cortana Analytics Workshop: The Future of Analytics
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Cortana Analytics Workshop: On-Premises Hadoop and Revolution R -- Architectures and Solutions

  • 1.
  • 2.
  • 3. The Microsoft Data Platform MobileReports Natural language queryDashboardsApplications StreamingRelational Internal & externalNon-relational NoSQL Orchestration Machine learningModeling Information management Complex event processing Data
  • 4. The Microsoft Data Platform MobileReports Natural language queryDashboardsApplications StreamingRelational Internal & externalNon-relational NoSQL Orchestration Machine learningModeling Information management Complex event processing Data
  • 5. What happened? Why did it happen? What will happen? How can we make it happen? Traditional BI Advanced Analytics
  • 6.
  • 7. • A Language Platform… • A Community… • An Ecosystem Download the White Paper R is Hot bit.ly/r-is-hot
  • 8.
  • 9. blog.revolutionanalytics.com/popularity R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  • 11. Who We Are Only provider of commercial big data big analytics platform based on open source R statistical computing language Our Software Delivers Scalable Performance: Distributed & parallelized analytics Cross Platform: Write once, deploy anywhere Productivity: Easily build & deploy with latest modern analytics Our Services Deliver Knowledge: Our experts enable you to be experts Time-to-Value: Our Quickstart program gives you a jumpstart Guidance: Our customer support team is here to help you Global Industries Served Financial Services Digital Media Government Health & Life Sciences High Tech Manufacturing Retail Telco Customers 300+ Global 2000 Global Presence North America / EMEA / APAC
  • 12. Our Vision: • R becomes the de-facto standard for enterprise predictive analytics Our Mission: • Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
  • 13. Support & Services • Commercial Support Programs • Training Programs • Professional Services Community Programs • Academic Support Programs • Contributions to Open Source R • Open Source Extensions • Sponsorship of R User Groups Software Products • Stable Distributions • Broad Platform Support • Big Data Analytics • Application Integration • Deployment Integration • Agile Development Tooling • Future Platform Support
  • 14. • Free and open source R distribution • Enhanced and distributed by Revolution Analytics Revolution R Open • Secure, Scalable and Supported Distribution of R • With proprietary components created by Revolution Analytics Revolution R Enterprise
  • 15. • Simplicity • Speed • Capability • Speed • Scale • Stability • Time-to-Results • Compatibility
  • 16.  Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort, Merge, Split  Aggregate by category (means, sums)  Min / Max, Mean, Median (approx.)  Quantiles (approx.)  Standard Deviation  Variance  Correlation  Covariance  Sum of Squares (cross product matrix for set variables)  Pairwise Cross tabs  Risk Ratio & Odds Ratio  Cross-Tabulation of Data (standard tables & long form)  Marginal Summaries of Cross Tabulations  Chi Square Test  Kendall Rank Correlation  Fisher’s Exact Test  Student’s t-Test  Subsample (observations & variables)  Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.  Covariance & Correlation Matrices  Logistic Regression  Classification & Regression Trees  Predictions/scoring for models  Residuals for all models Predictive Models  K-Means  Decision Trees  Decision Forests  Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection  Stepwise Regression  Simulation (e.g. Monte Carlo)  Parallel Random Number Generation Combination New in v7.3  PEMA-R API  rxDataStep  rxExec Coming in v7.4
  • 17.
  • 18. File Name Compressed File Size (MB) No. Rows Open Source R (secs) Revolution R (secs) Tiny 0.3 1,235 0.00 0.05 V. Small 0.4 12,353 0.21 0.05 Small 1.3 123,534 0.03 0.03 Medium 10.7 1,235,349 1.94 0.08 Large 104.5 12,353,496 60.69 0.42 Big (full) 12,960.0 123,534,969 Memory! 4.89 V.Big 25,919.7 247,069,938 Memory! 9.49 Huge 51,840.2 494,139,876 Memory! 18.92  Public US Flight Data  Linear Regression on Arrival Delay  Run on 4 core laptop, 16GB RAM and 500GB SSD
  • 19. 19
  • 20. 1. Starts A Master Process 2. Distribute Work 1. Threading? Cores? Sockets? Nodes? 2. Available RAM? 3. Location of Data? 3. Master Tasks for Each Node 4. Master Initiates Distributed Work 1. Hadoop Schedules Mapper for Each Split 2. Algorithm Computes Intermediate Result 3. Reducer Combines Intermediate Results 5. Master Process Evaluates Completion 6. Returns Consolidated Answer to Script In Revolution R Enterprise: Task Task Task Finalizer Initiator  … load a large dataset into Hadoop rxSetComputeContext(RxHadoopMR(…) Model_obj <- rxLinMod(…)  … use model object to predict… Big DataStub Remote Predictive Algorithms Move Logic To The Data
  • 21.
  • 23. rows minutes R on a server pulling data via SQL R on a server Invoking RRE ScaleR Inside the EDW
  • 24. Generalised Linear Model • 150 million observations • 70 degrees of freedom So what have we learned: • SAS works, but is slow. • The data is too big for open-source R, even on a very large server. • Hadoop is not a right fit • Revolution R Enterprise gets the same results as SAS, but about 50x faster. Software Platform Time to fit SAS 16-core Sun Server 5 hours rmr / map-reduce 10-node (8 cores / node) Hadoop cluster > 10 hours Open source R 250-GB Server Impossible (> 3 days) RevoScaleR 5-node (4 cores / node) LSF cluster 5.7 minutes
  • 25. R+CRAN RevolutionR DistributedR DeployR DevelopR ScaleR ConnectR Delivers High Performance Parallel Distributed Analytics Across Individual and Clustered Systems • Cloudera • Hortonworks • MapR • Apache Spark (coming soon) • IBM Platform LSF • Microsoft HPC Clusters • Teradata Database • Red Hat • SuSE Servers • Windows
  • 26.
  • 29. • Compressed • Chunkwise-columnar storage • Optimized for R: • Enables 20+x speedups 29
  • 30.
  • 33.
  • 36.
  • 38. • HDFS data source • Local “share” directory • HDFS “share” directory Share directories must be writable by “you” from both the edge node and the data node.
  • 39. • Distributed shell • Distributed copy • Distributed and install R packages • Deploy/Remove/Update Revolution R Enterprise • Add/Remove users from nodes
  • 42. • ScaleR on Spark • ScaleR on SQL Server Coming Soon… 42