Scalable Machine Learning using R and Azure HDInsight - Parashar

You can find more at https://gallery.cortanaintelligence.com/experiments

OperationalizeModelPrepare
Details - https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/

What is
• The most popular statistical programming language
• A data visualization tool
• Open source
• 3+ Million users
• Taught in most universities
• Thriving user groups worldwide
• 9000+ contributed packages
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration

• In-Memory Operation
• Lack of Parallelism
• Lack of Guaranteed Support / No SLA

• Any code/package that works today with R will work in R Server.
• Ideal for parameter sweeps, simulation, scoring.
• Transformations: rxDataStep(), Statistics: rxChiSquaredTest(), Algorithms: rxLinMod(), Parallelism: rxSetComputeContext()

• Provisions Azure
compute resources with
Spark installed and
configured.
• Data is stored in Azure
Blob storage (wasb://) or
Azure Data Lake Store
(adl://)

R
R Server
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio

R R R R R
R R R R R
R Server
Master R process on Edge Node
Apache YARN and Spark
Worker R processes on Data Nodes
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio

R server (single thread on local) R Server on HDInsight (4 nodes)
471 sec 144 sec (-70%)

https://www.microsoft.com/en-us/cloud-platform/r-server
https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
https://gallery.cortanaintelligence.com/experiments
https://www.visualstudio.com/vs/rtvs/
https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/introducing-microsoft-r-server-9-1-release/
https://blogs.technet.microsoft.com/machinelearning/2016/12/07/introducing-microsoft-r-server-9-0/
https://blogs.msdn.microsoft.com/rserver/2017/04/19/microsoft-ml-on-spark-and-hadoop/
https://msdn.microsoft.com/en-us/microsoft-r/scaler/packagehelp/rxcomputecontext
https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/04/19/new-features-in-9-1-microsoft-r-server-
with-sparklyr-interoperability/
https://msdn.microsoft.com/en-us/microsoft-r/scaler-spark-getting-started
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-get-started

Scalable Machine Learning using R and Azure HDInsight - Parashar

Scalable Machine Learning using R and Azure HDInsight - Parashar

More Related Content

What's hot

Similar to Scalable Machine Learning using R and Azure HDInsight - Parashar

Recently uploaded

Scalable Machine Learning using R and Azure HDInsight - Parashar

Editor's Notes