SlideShare a Scribd company logo
Microsoft R Server on Spark
Purpose:
This lab will demonstrate how to use Microsoft R Server on a Spark cluster. It will start by
outlining the steps to spin up the cluster in Azure, how to install RStudio with R Server, and an
example of how to use ScaleR to analyze data in a Spark cluster.
Pre-requisites
1. Be sure to have your Azure subscription enabled.
2. You will need to have a Secure Shell (SSH) client installed to remotely connect to the
HDInsight cluster and run commands directly on the cluster. This is needed since the
cluster will be using a Linux OS. The recommended client is PuTTY. Use the following link
to download and install PuTTY: PuTTY Download
a. Optionally, you can create an SSH key to connect to your cluster. The following
steps will assume that you are using a password. The following links include more
information on how to create and use SSH keys with HDInsight:
Use SSH with Linux-based Hadoop on HDInsight from Windows
Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X
Creating the R Server on Spark Cluster
1. In the Azure portal, select New > Data + Analytics > HDInsight
2. Enter a name in the Cluster Name field and select the appropriate Azure
subscription in the Subscription field.
3. Click Select Cluster Type. On the Cluster Type blade, select the following
options:
a. Cluster Type: R Server on Spark
b. Cluster Tier: Premium
Click Select to save the cluster type configuration.
4. Click Credentials to create the cluster login username and password and the SSH
username and password. This is also where you can upload a key instead of using
a username/password for SSH authentication.
5. Click the Data Source field. Create a new storage account and a default container
for the cluster to use.
6. Click the Pricing field. Here you will be able to specify the number of Worker
nodes, the size of the Worker nodes, the size of the Head nodes and the R server
node size (this is the edge node that you will connect to using SSH to run your R
code). For demo purposes, you can leave the default settings in place.
7. Optionally, you can select External Metastores for Hive and Oozie in the Optional
Configuration field if you have SQL Databases created to store Hive/Oozie job
metadata. For this demo, this option will remain blank.
8. Either create a new Resource group or select an existing on in the Resource
Group field.
9. Click Create to create the cluster.
Installing RStudio with R Server on HDInsight
The following steps assume that you have downloaded and installed PuTTY. Please refer
to the Prerequisites section at the top of this document for the link to download PuTTY.
1. Identify the edge node of the cluster. To find the name of the edge node, select
the recently created HDInsight cluster in the HDInsight Clusters blade. From
there, select Settings > Applications > R Server for HDInsight. The SSH
Endpoint is the name of the edge node for the cluster.
2. SSH into the edge node. Use the following steps to connect to the edge node:
a. To connect to the edge node, open PuTTY. The following is a screenshot of
PuTTY when it is opened up:
b. In the Category pane, select Session. Enter the SSH address of the
HDInsight server in the Host Name (or IP address) text box. This address
could be either the address of the head node or the address of the edge
node. Use the address of the edge node to connect to the edge node and
configure RStudio. Click Open to connect to the cluster.
c. Log in with the SSH credentials that were created when the cluster was
created.
3. Once connected, become a root user on the cluster. Use the following command
in the SSH session:
sudo su -
4. Download the custom script to install RStudio. Use the following command in the
SSH session
wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-
v01/InstallRStudio.sh
5. Change the permissions on the custom script file and run the script. Use the
following commands:
chmod 755 InstallRStudio.sh
./InstallRStudio.sh
6. Create an SSH tunnel to the cluster by mapping localhost:8787 on the HDInsight
Cluster to the client machine. This can be done through PuTTY.
a. Open PuTTY, and enter your connection information.
b. In the Category pane, expand Connection, expand SSH, and select
Tunnels.
c. Enter 8787 as the Source port and localhost:8787 as the Destination.
Click Add and then click Open to open an SSH connection.
d. When prompted, log in to the server with your SSH credentials. This will
establish an SSH session and enable the tunnel.
7. Open a web browser and enter the following URL based on the port entered for
the tunnel:
http://localhost:8787/
8. You will be prompted to enter the SSH username and password to connect to the
cluster.
9. The following command will download a test script that executes R based Spark
jobs on the cluster. Run this command from the PuTTY session:
wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-
v01/testhdi_spark.r
10. In RStudio, you will see the test script that was just downloaded in the lower right
pane. Double click the file to open it and click Run to run the code.
Use a compute context and simple statistics with ScaleR
A compute context allows you to control whether computation will be performed locally
on the edge node, or whether it will be distributed across the nodes in the HDInsight
cluster.
1. From the R console, use the following to load example data into the default
storage for HDInsight.
# Set the HDFS (WASB) location of example data
bigDataDirRoot <- "/example/data"
# create a local folder for storaging data temporarily
source <- "/tmp/AirOnTimeCSV2012"
dir.create(source)
# Download data to the tmp folder
remoteDir <- "http://packages.revolutionanalytics.com/datasets/AirOnTimeCSV2012"
download.file(file.path(remoteDir, "airOT201201.csv"), file.path(source,
"airOT201201.csv"))
download.file(file.path(remoteDir, "airOT201202.csv"), file.path(source,
"airOT201202.csv"))
download.file(file.path(remoteDir, "airOT201203.csv"), file.path(source,
"airOT201203.csv"))
download.file(file.path(remoteDir, "airOT201204.csv"), file.path(source,
"airOT201204.csv"))
download.file(file.path(remoteDir, "airOT201205.csv"), file.path(source,
"airOT201205.csv"))
download.file(file.path(remoteDir, "airOT201206.csv"), file.path(source,
"airOT201206.csv"))
download.file(file.path(remoteDir, "airOT201207.csv"), file.path(source,
"airOT201207.csv"))
download.file(file.path(remoteDir, "airOT201208.csv"), file.path(source,
"airOT201208.csv"))
download.file(file.path(remoteDir, "airOT201209.csv"), file.path(source,
"airOT201209.csv"))
download.file(file.path(remoteDir, "airOT201210.csv"), file.path(source,
"airOT201210.csv"))
download.file(file.path(remoteDir, "airOT201211.csv"), file.path(source,
"airOT201211.csv"))
download.file(file.path(remoteDir, "airOT201212.csv"), file.path(source,
"airOT201212.csv"))
# Set directory in bigDataDirRoot to load the data into
inputDir <- file.path(bigDataDirRoot,"AirOnTimeCSV2012")
# Make the directory
rxHadoopMakeDir(inputDir)
# Copy the data from source to input
rxHadoopCopyFromLocal(source, bigDataDirRoot)
2. Next, let's create some data info and define two data sources so that we can work
with the data.
# Define the HDFS (WASB) file system
hdfsFS <- RxHdfsFileSystem()
# Create info list for the airline data
airlineColInfo <- list(
DAY_OF_WEEK = list(type = "factor"),
ORIGIN = list(type = "factor"),
DEST = list(type = "factor"),
DEP_TIME = list(type = "integer"),
ARR_DEL15 = list(type = "logical"))
# get all the column names
varNames <- names(airlineColInfo)
# Define the text data source in hdfs
airOnTimeData <- RxTextData(inputDir, colInfo = airlineColInfo, varsToKeep =
varNames, fileSystem = hdfsFS)
# Define the text data source in local system
airOnTimeDataLocal <- RxTextData(source, colInfo = airlineColInfo, varsToKeep =
varNames)
# formula to use
formula = "ARR_DEL15 ~ ORIGIN + DAY_OF_WEEK + DEP_TIME + DEST"
3. Let's run a logistic regression over the data using the local compute context.
# Set a local compute context
rxSetComputeContext("local")
# Run a logistic regression
system.time(
modelLocal <- rxLogit(formula, data = airOnTimeDataLocal)
)
# Display a summary
summary(modelLocal)
4. Next, let's run the same logistic regression using the Spark context. The Spark
context will distribute the processing over all the worker nodes in the HDInsight
cluster.
# Define the Spark compute context
mySparkCluster <- RxSpark()
# Set the compute context
rxSetComputeContext(mySparkCluster)
# Run a logistic regression
system.time(
modelSpark <- rxLogit(formula, data = airOnTimeData)
)
# Display a summary
summary(modelSpark)
ScaleR Example with Linear Regression and Plots
This example will show different compute contexts, how to do linear regression in
RevoScaleR and how to do some simple plots. It utilized airline delay data for airports
across the United States.
#copy local file to HDFS
rxHadoopMakeDir("/share")
rxHadoopCopyFromLocal(system.file("SampleData/AirlineDemoSmall.csv",package="RevoScaleR"), "/share")
myNameNode <- "default"
myPort <- 0
# Location of the data
bigDataDirRoot <- "/share"
# define HDFS file system
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
# specify the input file in HDFS to analyze
inputFile <-file.path(bigDataDirRoot,"AirlineDemoSmall.csv")
# create Factors for days of the week
colInfo <- list(DayOfWeek = list(type = "factor",
levels = c("Monday","Tuesday","Wednesday",
"Thursday","Friday","Saturday","Sunday")))
# define the data source
airDS <- RxTextData(file = inputFile, missingValueString = "M",
colInfo = colInfo, fileSystem = hdfsFS)
# First test the "local" compute context
rxSetComputeContext("local")
# Run a linear regression
system.time(
model <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS)
)
# display a summary of model
summary(model)
# define MapReduce compute context
myHadoopMRCluster <- RxHadoopMR(consoleOutput=TRUE,
nameNode=myNameNode,
port=myPort,
hadoopSwitches="-libjars /etc/hadoop/conf")
# set compute context
rxSetComputeContext(myHadoopMRCluster)
# Run a linear regression
system.time(
model1 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS)
)
# display a summary of model
summary(model1)
rxLinePlot(ArrDelay~DayOfWeek, data= airDS)
# define Spark compute context
mySparkCluster <- RxSpark(consoleOutput=TRUE)
# set compute context
rxSetComputeContext(mySparkCluster)
# Run a linear regression
system.time(
model2 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS)
)
# display a summary of model
summary(model2)
# Run 4 tasks via rxExec
rxExec( function() {Sys.info()["nodename"]}, timesToRun = 4 )
Wrap Up
This lab was meant to demonstrate how to use Microsoft R Server on a Spark cluster. For
more information, refer to the references listed in the References section.
References
1. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r-
server-get-started/
Microsoft R server for distributed computing
The First NIDA Business Analytics and Data Sciences Contest/Conference
วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์
-แนะนํา Microsoft R Server
-Distributed Computing มีวิธีการอย่างไร และมีประโยชน์อย่างไร
-แนะนําวิธีการ Configuration สําหรับ Distributed Computing
https://businessanalyticsnida.wordpress.com
https://www.facebook.com/BusinessAnalyticsNIDA/
กฤษฏิ์ คําตื้อ,
Technical Evangelist,
Microsoft (Thailand)
-Distributed computing กับ Big Data
-Analytics บน R server
-สาธิตและสอนในลักษณะ workshop
Computer Lab 2 ชั้น 10 อาคารสยามบรมราชกุมารี
1 กันยายน 2559 เวลา 9.00-12.30
2. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r-
server-install-r-studio/
3. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-
linux-use-ssh-windows/#connect-to-a-linux-based-hdinsight-cluster

More Related Content

What's hot

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Carol McDonald
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
Revolution Analytics
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Mark Tabladillo
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
Knoldus Inc.
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
Mike Frampton
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
The HDF-EOS Tools and Information Center
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
StampedeCon
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
gianmerlino
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
The HDF-EOS Tools and Information Center
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of Sharding
EDB
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
BigDataEverywhere
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
The HDF-EOS Tools and Information Center
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
Ming Yuan
 

What's hot (20)

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of Sharding
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 

Viewers also liked

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
BAINIDA
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Revolution Analytics
 
All thingspython@pivotal
All thingspython@pivotalAll thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Victoria López
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Willy Marroquin (WillyDevNET)
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in R
armstrtw
 
Nida event oracle business analytics 1 sep2016
Nida event   oracle business analytics 1 sep2016Nida event   oracle business analytics 1 sep2016
Nida event oracle business analytics 1 sep2016
BAINIDA
 
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
BAINIDA
 
Second prize data analysis @ the First NIDA business analytics and data scie...
Second prize data analysis @ the First NIDA  business analytics and data scie...Second prize data analysis @ the First NIDA  business analytics and data scie...
Second prize data analysis @ the First NIDA business analytics and data scie...
BAINIDA
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
BAINIDA
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
David Chiu
 
Oracle Enterprise Performance Management
Oracle Enterprise Performance ManagementOracle Enterprise Performance Management
Oracle Enterprise Performance Management
BAINIDA
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence Applications
Revolution Analytics
 
Tableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualizationTableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualization
BAINIDA
 
Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...
BAINIDA
 
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
BAINIDA
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
Revolution Analytics
 
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
BAINIDA
 

Viewers also liked (20)

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
All thingspython@pivotal
All thingspython@pivotalAll thingspython@pivotal
All thingspython@pivotal
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in R
 
Nida event oracle business analytics 1 sep2016
Nida event   oracle business analytics 1 sep2016Nida event   oracle business analytics 1 sep2016
Nida event oracle business analytics 1 sep2016
 
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
 
Second prize data analysis @ the First NIDA business analytics and data scie...
Second prize data analysis @ the First NIDA  business analytics and data scie...Second prize data analysis @ the First NIDA  business analytics and data scie...
Second prize data analysis @ the First NIDA business analytics and data scie...
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Oracle Enterprise Performance Management
Oracle Enterprise Performance ManagementOracle Enterprise Performance Management
Oracle Enterprise Performance Management
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence Applications
 
Tableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualizationTableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualization
 
Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...
 
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
 

Similar to R server and spark

Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceQuick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Cloudian
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
linux installation.pdf
linux installation.pdflinux installation.pdf
linux installation.pdf
MuhammadShoaibHussai2
 
Book
BookBook
Book
luis_lmro
 
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...Cloudian
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
Jos Boumans
 
How to become cloud backup provider
How to become cloud backup providerHow to become cloud backup provider
How to become cloud backup providerCLOUDIAN KK
 
How to Become Cloud Backup Provider
How to Become Cloud Backup ProviderHow to Become Cloud Backup Provider
How to Become Cloud Backup Provider
Cloudian
 
reModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQLreModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQL
Amazon Web Services
 
Usage Note of SWIG for PHP
Usage Note of SWIG for PHPUsage Note of SWIG for PHP
Usage Note of SWIG for PHP
William Lee
 
Lab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQLLab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQL
Amazon Web Services
 
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2   Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2 Adil Khan
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
Naveed Bashir
 
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQLHands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
Amazon Web Services
 
Drupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - DeployDrupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - Deploy
John Smith
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
Securing Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With CopsshSecuring Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With Copssh
Crismer La Pignola
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]
Joshua Harlow
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
Alessandro Arrichiello
 

Similar to R server and spark (20)

Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceQuick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
 
Big datademo
Big datademoBig datademo
Big datademo
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
linux installation.pdf
linux installation.pdflinux installation.pdf
linux installation.pdf
 
Book
BookBook
Book
 
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
 
How to become cloud backup provider
How to become cloud backup providerHow to become cloud backup provider
How to become cloud backup provider
 
How to Become Cloud Backup Provider
How to Become Cloud Backup ProviderHow to Become Cloud Backup Provider
How to Become Cloud Backup Provider
 
reModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQLreModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQL
 
Usage Note of SWIG for PHP
Usage Note of SWIG for PHPUsage Note of SWIG for PHP
Usage Note of SWIG for PHP
 
Lab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQLLab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQL
 
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2   Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQLHands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
 
Drupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - DeployDrupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - Deploy
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
Securing Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With CopsshSecuring Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With Copssh
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 

More from BAINIDA

Mixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciencesMixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciences
BAINIDA
 
Advanced quantitative research methods in political science and pa
Advanced quantitative  research methods in political science and paAdvanced quantitative  research methods in political science and pa
Advanced quantitative research methods in political science and pa
BAINIDA
 
Latest thailand election2019report
Latest thailand election2019reportLatest thailand election2019report
Latest thailand election2019report
BAINIDA
 
Data science in medicine
Data science in medicineData science in medicine
Data science in medicine
BAINIDA
 
Nursing data science
Nursing data scienceNursing data science
Nursing data science
BAINIDA
 
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
BAINIDA
 
Statistics and big data for justice and fairness
Statistics and big data for justice and fairnessStatistics and big data for justice and fairness
Statistics and big data for justice and fairness
BAINIDA
 
Data science and big data for business and industrial application
Data science and big data  for business and industrial applicationData science and big data  for business and industrial application
Data science and big data for business and industrial application
BAINIDA
 
Update trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-upUpdate trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-up
BAINIDA
 
Advent of ds and stat adjustment
Advent of ds and stat adjustmentAdvent of ds and stat adjustment
Advent of ds and stat adjustment
BAINIDA
 
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
BAINIDA
 
Data visualization. map
Data visualization. map Data visualization. map
Data visualization. map
BAINIDA
 
Dark data by Worapol Alex Pongpech
Dark data by Worapol Alex PongpechDark data by Worapol Alex Pongpech
Dark data by Worapol Alex Pongpech
BAINIDA
 
Deepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDADeepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDA
BAINIDA
 
Professionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data ScienceProfessionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data Science
BAINIDA
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
Visualizing for impact final
Visualizing for impact finalVisualizing for impact final
Visualizing for impact final
BAINIDA
 
Python programming workshop
Python programming workshopPython programming workshop
Python programming workshop
BAINIDA
 
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
BAINIDA
 
Oracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management OverviewOracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management Overview
BAINIDA
 

More from BAINIDA (20)

Mixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciencesMixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciences
 
Advanced quantitative research methods in political science and pa
Advanced quantitative  research methods in political science and paAdvanced quantitative  research methods in political science and pa
Advanced quantitative research methods in political science and pa
 
Latest thailand election2019report
Latest thailand election2019reportLatest thailand election2019report
Latest thailand election2019report
 
Data science in medicine
Data science in medicineData science in medicine
Data science in medicine
 
Nursing data science
Nursing data scienceNursing data science
Nursing data science
 
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
 
Statistics and big data for justice and fairness
Statistics and big data for justice and fairnessStatistics and big data for justice and fairness
Statistics and big data for justice and fairness
 
Data science and big data for business and industrial application
Data science and big data  for business and industrial applicationData science and big data  for business and industrial application
Data science and big data for business and industrial application
 
Update trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-upUpdate trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-up
 
Advent of ds and stat adjustment
Advent of ds and stat adjustmentAdvent of ds and stat adjustment
Advent of ds and stat adjustment
 
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
 
Data visualization. map
Data visualization. map Data visualization. map
Data visualization. map
 
Dark data by Worapol Alex Pongpech
Dark data by Worapol Alex PongpechDark data by Worapol Alex Pongpech
Dark data by Worapol Alex Pongpech
 
Deepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDADeepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDA
 
Professionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data ScienceProfessionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data Science
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Visualizing for impact final
Visualizing for impact finalVisualizing for impact final
Visualizing for impact final
 
Python programming workshop
Python programming workshopPython programming workshop
Python programming workshop
 
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
 
Oracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management OverviewOracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management Overview
 

Recently uploaded

Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 

Recently uploaded (20)

Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 

R server and spark

  • 1. Microsoft R Server on Spark Purpose: This lab will demonstrate how to use Microsoft R Server on a Spark cluster. It will start by outlining the steps to spin up the cluster in Azure, how to install RStudio with R Server, and an example of how to use ScaleR to analyze data in a Spark cluster. Pre-requisites 1. Be sure to have your Azure subscription enabled. 2. You will need to have a Secure Shell (SSH) client installed to remotely connect to the HDInsight cluster and run commands directly on the cluster. This is needed since the cluster will be using a Linux OS. The recommended client is PuTTY. Use the following link to download and install PuTTY: PuTTY Download a. Optionally, you can create an SSH key to connect to your cluster. The following steps will assume that you are using a password. The following links include more information on how to create and use SSH keys with HDInsight: Use SSH with Linux-based Hadoop on HDInsight from Windows Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X Creating the R Server on Spark Cluster 1. In the Azure portal, select New > Data + Analytics > HDInsight
  • 2. 2. Enter a name in the Cluster Name field and select the appropriate Azure subscription in the Subscription field. 3. Click Select Cluster Type. On the Cluster Type blade, select the following options: a. Cluster Type: R Server on Spark b. Cluster Tier: Premium Click Select to save the cluster type configuration. 4. Click Credentials to create the cluster login username and password and the SSH username and password. This is also where you can upload a key instead of using a username/password for SSH authentication. 5. Click the Data Source field. Create a new storage account and a default container for the cluster to use. 6. Click the Pricing field. Here you will be able to specify the number of Worker nodes, the size of the Worker nodes, the size of the Head nodes and the R server
  • 3. node size (this is the edge node that you will connect to using SSH to run your R code). For demo purposes, you can leave the default settings in place. 7. Optionally, you can select External Metastores for Hive and Oozie in the Optional Configuration field if you have SQL Databases created to store Hive/Oozie job metadata. For this demo, this option will remain blank. 8. Either create a new Resource group or select an existing on in the Resource Group field. 9. Click Create to create the cluster. Installing RStudio with R Server on HDInsight The following steps assume that you have downloaded and installed PuTTY. Please refer to the Prerequisites section at the top of this document for the link to download PuTTY. 1. Identify the edge node of the cluster. To find the name of the edge node, select the recently created HDInsight cluster in the HDInsight Clusters blade. From there, select Settings > Applications > R Server for HDInsight. The SSH Endpoint is the name of the edge node for the cluster. 2. SSH into the edge node. Use the following steps to connect to the edge node:
  • 4. a. To connect to the edge node, open PuTTY. The following is a screenshot of PuTTY when it is opened up: b. In the Category pane, select Session. Enter the SSH address of the HDInsight server in the Host Name (or IP address) text box. This address could be either the address of the head node or the address of the edge node. Use the address of the edge node to connect to the edge node and configure RStudio. Click Open to connect to the cluster.
  • 5. c. Log in with the SSH credentials that were created when the cluster was created. 3. Once connected, become a root user on the cluster. Use the following command in the SSH session: sudo su - 4. Download the custom script to install RStudio. Use the following command in the SSH session wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community- v01/InstallRStudio.sh 5. Change the permissions on the custom script file and run the script. Use the following commands: chmod 755 InstallRStudio.sh ./InstallRStudio.sh
  • 6. 6. Create an SSH tunnel to the cluster by mapping localhost:8787 on the HDInsight Cluster to the client machine. This can be done through PuTTY. a. Open PuTTY, and enter your connection information. b. In the Category pane, expand Connection, expand SSH, and select Tunnels. c. Enter 8787 as the Source port and localhost:8787 as the Destination. Click Add and then click Open to open an SSH connection. d. When prompted, log in to the server with your SSH credentials. This will establish an SSH session and enable the tunnel. 7. Open a web browser and enter the following URL based on the port entered for the tunnel: http://localhost:8787/ 8. You will be prompted to enter the SSH username and password to connect to the cluster.
  • 7. 9. The following command will download a test script that executes R based Spark jobs on the cluster. Run this command from the PuTTY session: wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community- v01/testhdi_spark.r 10. In RStudio, you will see the test script that was just downloaded in the lower right pane. Double click the file to open it and click Run to run the code. Use a compute context and simple statistics with ScaleR A compute context allows you to control whether computation will be performed locally on the edge node, or whether it will be distributed across the nodes in the HDInsight cluster. 1. From the R console, use the following to load example data into the default storage for HDInsight. # Set the HDFS (WASB) location of example data bigDataDirRoot <- "/example/data" # create a local folder for storaging data temporarily source <- "/tmp/AirOnTimeCSV2012" dir.create(source) # Download data to the tmp folder remoteDir <- "http://packages.revolutionanalytics.com/datasets/AirOnTimeCSV2012" download.file(file.path(remoteDir, "airOT201201.csv"), file.path(source, "airOT201201.csv")) download.file(file.path(remoteDir, "airOT201202.csv"), file.path(source, "airOT201202.csv")) download.file(file.path(remoteDir, "airOT201203.csv"), file.path(source, "airOT201203.csv")) download.file(file.path(remoteDir, "airOT201204.csv"), file.path(source, "airOT201204.csv")) download.file(file.path(remoteDir, "airOT201205.csv"), file.path(source, "airOT201205.csv")) download.file(file.path(remoteDir, "airOT201206.csv"), file.path(source, "airOT201206.csv")) download.file(file.path(remoteDir, "airOT201207.csv"), file.path(source, "airOT201207.csv")) download.file(file.path(remoteDir, "airOT201208.csv"), file.path(source, "airOT201208.csv")) download.file(file.path(remoteDir, "airOT201209.csv"), file.path(source, "airOT201209.csv")) download.file(file.path(remoteDir, "airOT201210.csv"), file.path(source, "airOT201210.csv"))
  • 8. download.file(file.path(remoteDir, "airOT201211.csv"), file.path(source, "airOT201211.csv")) download.file(file.path(remoteDir, "airOT201212.csv"), file.path(source, "airOT201212.csv")) # Set directory in bigDataDirRoot to load the data into inputDir <- file.path(bigDataDirRoot,"AirOnTimeCSV2012") # Make the directory rxHadoopMakeDir(inputDir) # Copy the data from source to input rxHadoopCopyFromLocal(source, bigDataDirRoot) 2. Next, let's create some data info and define two data sources so that we can work with the data. # Define the HDFS (WASB) file system hdfsFS <- RxHdfsFileSystem() # Create info list for the airline data airlineColInfo <- list( DAY_OF_WEEK = list(type = "factor"), ORIGIN = list(type = "factor"), DEST = list(type = "factor"), DEP_TIME = list(type = "integer"), ARR_DEL15 = list(type = "logical")) # get all the column names varNames <- names(airlineColInfo) # Define the text data source in hdfs airOnTimeData <- RxTextData(inputDir, colInfo = airlineColInfo, varsToKeep = varNames, fileSystem = hdfsFS) # Define the text data source in local system airOnTimeDataLocal <- RxTextData(source, colInfo = airlineColInfo, varsToKeep = varNames) # formula to use formula = "ARR_DEL15 ~ ORIGIN + DAY_OF_WEEK + DEP_TIME + DEST" 3. Let's run a logistic regression over the data using the local compute context. # Set a local compute context rxSetComputeContext("local") # Run a logistic regression system.time( modelLocal <- rxLogit(formula, data = airOnTimeDataLocal) ) # Display a summary summary(modelLocal)
  • 9. 4. Next, let's run the same logistic regression using the Spark context. The Spark context will distribute the processing over all the worker nodes in the HDInsight cluster. # Define the Spark compute context mySparkCluster <- RxSpark() # Set the compute context rxSetComputeContext(mySparkCluster) # Run a logistic regression system.time( modelSpark <- rxLogit(formula, data = airOnTimeData) ) # Display a summary summary(modelSpark) ScaleR Example with Linear Regression and Plots This example will show different compute contexts, how to do linear regression in RevoScaleR and how to do some simple plots. It utilized airline delay data for airports across the United States. #copy local file to HDFS rxHadoopMakeDir("/share") rxHadoopCopyFromLocal(system.file("SampleData/AirlineDemoSmall.csv",package="RevoScaleR"), "/share") myNameNode <- "default" myPort <- 0 # Location of the data bigDataDirRoot <- "/share" # define HDFS file system hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort) # specify the input file in HDFS to analyze inputFile <-file.path(bigDataDirRoot,"AirlineDemoSmall.csv") # create Factors for days of the week colInfo <- list(DayOfWeek = list(type = "factor", levels = c("Monday","Tuesday","Wednesday", "Thursday","Friday","Saturday","Sunday"))) # define the data source airDS <- RxTextData(file = inputFile, missingValueString = "M", colInfo = colInfo, fileSystem = hdfsFS) # First test the "local" compute context rxSetComputeContext("local") # Run a linear regression system.time(
  • 10. model <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS) ) # display a summary of model summary(model) # define MapReduce compute context myHadoopMRCluster <- RxHadoopMR(consoleOutput=TRUE, nameNode=myNameNode, port=myPort, hadoopSwitches="-libjars /etc/hadoop/conf") # set compute context rxSetComputeContext(myHadoopMRCluster) # Run a linear regression system.time( model1 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS) ) # display a summary of model summary(model1) rxLinePlot(ArrDelay~DayOfWeek, data= airDS) # define Spark compute context mySparkCluster <- RxSpark(consoleOutput=TRUE) # set compute context rxSetComputeContext(mySparkCluster) # Run a linear regression system.time( model2 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS) ) # display a summary of model summary(model2) # Run 4 tasks via rxExec rxExec( function() {Sys.info()["nodename"]}, timesToRun = 4 ) Wrap Up This lab was meant to demonstrate how to use Microsoft R Server on a Spark cluster. For more information, refer to the references listed in the References section. References 1. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r- server-get-started/
  • 11. Microsoft R server for distributed computing The First NIDA Business Analytics and Data Sciences Contest/Conference วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์ -แนะนํา Microsoft R Server -Distributed Computing มีวิธีการอย่างไร และมีประโยชน์อย่างไร -แนะนําวิธีการ Configuration สําหรับ Distributed Computing https://businessanalyticsnida.wordpress.com https://www.facebook.com/BusinessAnalyticsNIDA/ กฤษฏิ์ คําตื้อ, Technical Evangelist, Microsoft (Thailand) -Distributed computing กับ Big Data -Analytics บน R server -สาธิตและสอนในลักษณะ workshop Computer Lab 2 ชั้น 10 อาคารสยามบรมราชกุมารี 1 กันยายน 2559 เวลา 9.00-12.30