You can find more at https://gallery.cortanaintelligence.com/experiments
OperationalizeModelPrepare
Details - https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
What is
• The most popular statistical programming language
• A data visualization tool
• Open source
• 3+ Million users
• Taught in most universities
• Thriving user groups worldwide
• 9000+ contributed packages
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration
• In-Memory Operation
• Lack of Parallelism
• Lack of Guaranteed Support / No SLA
• Any code/package that works today with R will work in R Server.
• Ideal for parameter sweeps, simulation, scoring.
• Transformations: rxDataStep(), Statistics: rxChiSquaredTest(), Algorithms: rxLinMod(), Parallelism: rxSetComputeContext()
• Provisions Azure
compute resources with
Spark installed and
configured.
• Data is stored in Azure
Blob storage (wasb://) or
Azure Data Lake Store
(adl://)
R
R Server
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
R R R R R
R R R R R
R Server
Master R process on Edge Node
Apache YARN and Spark
Worker R processes on Data Nodes
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
R server (single thread on local) R Server on HDInsight (4 nodes)
471 sec 144 sec (-70%)
https://www.microsoft.com/en-us/cloud-platform/r-server
https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
https://gallery.cortanaintelligence.com/experiments
https://www.visualstudio.com/vs/rtvs/
https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/introducing-microsoft-r-server-9-1-release/
https://blogs.technet.microsoft.com/machinelearning/2016/12/07/introducing-microsoft-r-server-9-0/
https://blogs.msdn.microsoft.com/rserver/2017/04/19/microsoft-ml-on-spark-and-hadoop/
https://msdn.microsoft.com/en-us/microsoft-r/scaler/packagehelp/rxcomputecontext
https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/04/19/new-features-in-9-1-microsoft-r-server-
with-sparklyr-interoperability/
https://msdn.microsoft.com/en-us/microsoft-r/scaler-spark-getting-started
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-get-started
Scalable Machine Learning using R and Azure HDInsight - Parashar

Scalable Machine Learning using R and Azure HDInsight - Parashar

Editor's Notes

  • #15 Azpowerhour0420(A!,rg)