Hdp r-google charttools-webinar-3-5-2013 (2)

Quick House Keeping Rule

• Q&A panel is available if you have any questions during the
webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording

Page 1
© Hortonworks Inc. 2013

Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers

Jeff Markham
Solution Engineer

jmarkham@hortonworks.com


Agenda
• Introductions
• Use Case Description
• Preparation
• Demo
• Review
• Q&A

Page 3

Use Case Description
• Visualizing data
• Tools vs. application development
• Choosing the technology
• Hortonworks Data Platform
• RHadoop
• Google Charts

Page 4

Preparation: Install HDP

OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, HIVE
PIG
Operate at Process and HBASE Enterprise Hadoop
Scale SQOOP Access Data
OOZIE HCATALOG
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN

PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability
OS Cloud VM Appliance

Page 5

Preparation: Install R
• Install R language

• Install appropriate packages
– rhdfs
– rmr2
– googleVis
– shiny
– Dependencies for all above

Page 6

Preparation
• rmr2
– Functions to allow for MapReduce in R apps

• rhdfs
– Functions allowing HDFS access in R apps

• googleVis
– Use of Google Chart Tools in R apps

• shiny
– Interactive web apps for R developers

Page 7

Demo Walkthrough
Using Hadoop, R, and Google Chart Tools


Visualization Use Case
• Data from CDC
– Vital statistics publicly available data
– 2010 US birth data file

S 201001 7 2 2 30105
2 011 06 1 123 3405 1 06 01 2 2
SAMPLE RECORD

0321 1006 314 2000 2 222 22
2 2 2 122222 11 3 094 1 M 04 200940 39072 3941
083 22 2 2 22 110 110 00
0000000 00 000000000 000000 000 000000000000000000011
101 1 111 10 1 1 1 111111 11 1 1 11

source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

Page 9

• Put data into HDFS
– Create input directory
– Put data into input directory
CREATE HDFS DIR

> hadoop fs –mkdir /user/jeff/natality
PUT DATA INTO HDFS

> hadoop fs –put ~/VS2010NATL.DETAILUS.DAT
/user/jeff/natality/

Page 10

• Write R script
– Specify use of RHadoop packages
– Initialize HDFS
– Specify data input and output location

#!/usr/bin/env Rscript

require('rmr2')
require('rhdfs')
hdfs.init()
R SCRIPT

hdfs.data.root = 'natality'
hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')
hdfs.out.root = hdfs.data.root
hdfs.out = file.path(hdfs.out.root, 'out')

...

Page 11

• Write R script
– Write mapper function
– Write reducer function

...

mapper = function(k, fields) {
keyval(as.integer(substr(fields, 89, 90)),1)
}
R SCRIPT

reducer = function(key, vv) {
# count values for each key
keyval(key, sum(as.numeric(vv),na.rm=TRUE))
}
...

Page 12

• Write R script
– Write job function

...

job = function (input, output) {
mapreduce(input = input,
output = output,
R SCRIPT

input.format = "text",
map = mapper,
reduce = reducer,
combine = T)
}...

Page 13

• Write R script
– Write result to HDFS output directory

...
R SCRIPT

out = from.dfs(job(hdfs.data, hdfs.out))
results.df = as.data.frame(out,stringsAsFactors=F)

Page 14

• Create Shiny application

– Create directory
– Create ui.R
– Create server.R
SHINY APP DIR

> mkdir ~/my-shiny-app

Page 15

– Create ui.R

shinyUI(pageWithSidebar(

# Application title
headerPanel("2010 US Births"),

sidebarPanel(. . .),
UI.R SOURCE

mainPanel(
tabsetPanel(
tabPanel("Line Chart", htmlOutput("lineChart")),
tabPanel("Column Chart", htmlOutput("columnChart"))
)
)
))

Page 16

– Create server.R

library(googleVis)
library(shiny)
library(rmr2)
library(rhdfs)
SERVER.R SOURCE

hdfs.init()

hdfs.data.root = 'natality'
hdfs.data = file.path(hdfs.data.root, 'out')
df = as.data.frame(from.dfs(hdfs.data))

...

Page 17

– Create server.R

...
shinyServer(function(input, output) {

output$lineChart <- renderGvis({
SERVER.R SOURCE

gvisLineChart(df, options=list(
vAxis="{title:'Number of Births'}",
hAxis="{title:'Age of Mother'}",
legend="none"
))
})
...

Page 18

• Run Shiny application

> shiny::runApp('~/my-shiny-app')
Loading required package: shiny

Welcome to googleVis version 0.4.0
RUN SHINY APP

...

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()

Listening on port 8100

Page 19

• View Shiny application

Page 20

Demo Live
Using Hadoop, R, and Google Chart Tools


• Architecture recap
– Analyze data sets with R on Hadoop
– Choose RHadoop packages
– Visualize data with Google Chart Tools via googleVis package
– Render googleVis output in Shiny applications

• Architecture next steps
– Integrate Shiny application into existing web apps
– Create further data models with R

Page 22

HDP: Enterprise Hadoop Distribution

OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, HIVE
PIG
Operate at Process and HBASE Enterprise Hadoop
Scale SQOOP Access Data
OOZIE HCATALOG
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN

PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability
OS Cloud VM Appliance

Page 23

HDP Sandbox

Page 24

Thank You!

Jeff Markham
Solution Engineer

jmarkham@hortonworks.com

Page 25

Hdp r-google charttools-webinar-3-5-2013 (2)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hdp r-google charttools-webinar-3-5-2013 (2)

Similar to Hdp r-google charttools-webinar-3-5-2013 (2) (20)

More from Hortonworks

More from Hortonworks (20)

Hdp r-google charttools-webinar-3-5-2013 (2)

Editor's Notes