More Related Content
Similar to Hdp r-google charttools-webinar-3-5-2013 (2)
Similar to Hdp r-google charttools-webinar-3-5-2013 (2) (20)
More from Hortonworks (20)
Hdp r-google charttools-webinar-3-5-2013 (2)
- 1. Quick House Keeping Rule
• Q&A panel is available if you have any questions during the
webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording
Page 1
© Hortonworks Inc. 2013
- 2. Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers
Jeff Markham
Solution Engineer
jmarkham@hortonworks.com
© Hortonworks Inc. 2013
- 3. Agenda
• Introductions
• Use Case Description
• Preparation
• Demo
• Review
• Q&A
Page 3
© Hortonworks Inc. 2013
- 4. Use Case Description
• Visualizing data
• Tools vs. application development
• Choosing the technology
• Hortonworks Data Platform
• RHadoop
• Google Charts
Page 4
© Hortonworks Inc. 2013
- 5. Preparation: Install HDP
OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, HIVE
PIG
Operate at Process and HBASE Enterprise Hadoop
Scale SQOOP Access Data
OOZIE HCATALOG
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN
PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability
OS Cloud VM Appliance
Page 5
© Hortonworks Inc. 2013
- 6. Preparation: Install R
• Install R language
• Install appropriate packages
– rhdfs
– rmr2
– googleVis
– shiny
– Dependencies for all above
Page 6
© Hortonworks Inc. 2013
- 7. Preparation
• rmr2
– Functions to allow for MapReduce in R apps
• rhdfs
– Functions allowing HDFS access in R apps
• googleVis
– Use of Google Chart Tools in R apps
• shiny
– Interactive web apps for R developers
Page 7
© Hortonworks Inc. 2013
- 8. Demo Walkthrough
Using Hadoop, R, and Google Chart Tools
© Hortonworks Inc. 2012
- 9. Visualization Use Case
• Data from CDC
– Vital statistics publicly available data
– 2010 US birth data file
S 201001 7 2 2 30105
2 011 06 1 123 3405 1 06 01 2 2
SAMPLE RECORD
0321 1006 314 2000 2 222 22
2 2 2 122222 11 3 094 1 M 04 200940 39072 3941
083 22 2 2 22 110 110 00
0000000 00 000000000 000000 000 000000000000000000011
101 1 111 10 1 1 1 111111 11 1 1 11
source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm
Page 9
© Hortonworks Inc. 2013
- 10. Visualization Use Case
• Put data into HDFS
– Create input directory
– Put data into input directory
CREATE HDFS DIR
> hadoop fs –mkdir /user/jeff/natality
PUT DATA INTO HDFS
> hadoop fs –put ~/VS2010NATL.DETAILUS.DAT
/user/jeff/natality/
Page 10
© Hortonworks Inc. 2013
- 11. Visualization Use Case
• Write R script
– Specify use of RHadoop packages
– Initialize HDFS
– Specify data input and output location
#!/usr/bin/env Rscript
require('rmr2')
require('rhdfs')
hdfs.init()
R SCRIPT
hdfs.data.root = 'natality'
hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')
hdfs.out.root = hdfs.data.root
hdfs.out = file.path(hdfs.out.root, 'out')
...
Page 11
© Hortonworks Inc. 2013
- 12. Visualization Use Case
• Write R script
– Write mapper function
– Write reducer function
...
mapper = function(k, fields) {
keyval(as.integer(substr(fields, 89, 90)),1)
}
R SCRIPT
reducer = function(key, vv) {
# count values for each key
keyval(key, sum(as.numeric(vv),na.rm=TRUE))
}
...
Page 12
© Hortonworks Inc. 2013
- 13. Visualization Use Case
• Write R script
– Write job function
...
job = function (input, output) {
mapreduce(input = input,
output = output,
R SCRIPT
input.format = "text",
map = mapper,
reduce = reducer,
combine = T)
}...
Page 13
© Hortonworks Inc. 2013
- 14. Visualization Use Case
• Write R script
– Write result to HDFS output directory
...
R SCRIPT
out = from.dfs(job(hdfs.data, hdfs.out))
results.df = as.data.frame(out,stringsAsFactors=F)
Page 14
© Hortonworks Inc. 2013
- 15. Visualization Use Case
• Create Shiny application
– Create directory
– Create ui.R
– Create server.R
SHINY APP DIR
> mkdir ~/my-shiny-app
Page 15
© Hortonworks Inc. 2013
- 16. Visualization Use Case
• Create Shiny application
– Create ui.R
shinyUI(pageWithSidebar(
# Application title
headerPanel("2010 US Births"),
sidebarPanel(. . .),
UI.R SOURCE
mainPanel(
tabsetPanel(
tabPanel("Line Chart", htmlOutput("lineChart")),
tabPanel("Column Chart", htmlOutput("columnChart"))
)
)
))
Page 16
© Hortonworks Inc. 2013
- 17. Visualization Use Case
• Create Shiny application
– Create server.R
library(googleVis)
library(shiny)
library(rmr2)
library(rhdfs)
SERVER.R SOURCE
hdfs.init()
hdfs.data.root = 'natality'
hdfs.data = file.path(hdfs.data.root, 'out')
df = as.data.frame(from.dfs(hdfs.data))
...
Page 17
© Hortonworks Inc. 2013
- 18. Visualization Use Case
• Create Shiny application
– Create server.R
...
shinyServer(function(input, output) {
output$lineChart <- renderGvis({
SERVER.R SOURCE
gvisLineChart(df, options=list(
vAxis="{title:'Number of Births'}",
hAxis="{title:'Age of Mother'}",
legend="none"
))
})
...
Page 18
© Hortonworks Inc. 2013
- 19. Visualization Use Case
• Run Shiny application
> shiny::runApp('~/my-shiny-app')
Loading required package: shiny
Welcome to googleVis version 0.4.0
RUN SHINY APP
...
HADOOP_CMD=/usr/bin/hadoop
Be sure to run hdfs.init()
Listening on port 8100
Page 19
© Hortonworks Inc. 2013
- 21. Demo Live
Using Hadoop, R, and Google Chart Tools
© Hortonworks Inc. 2012
- 22. Visualization Use Case
• Architecture recap
– Analyze data sets with R on Hadoop
– Choose RHadoop packages
– Visualize data with Google Chart Tools via googleVis package
– Render googleVis output in Shiny applications
• Architecture next steps
– Integrate Shiny application into existing web apps
– Create further data models with R
Page 22
© Hortonworks Inc. 2013
- 23. HDP: Enterprise Hadoop Distribution
OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, HIVE
PIG
Operate at Process and HBASE Enterprise Hadoop
Scale SQOOP Access Data
OOZIE HCATALOG
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN
PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability
OS Cloud VM Appliance
Page 23
© Hortonworks Inc. 2013
Editor's Notes
- Hi, I’m Jeff Markham and I wanted to talk today about
- Agenda points
- Describe the use case and how to choose the tech
- Start by installing HDP
- Install R and dependencies
- Go into more detail on the R packages
- Walk through the demo before actually doing the demo
- Describe the data set
- Start with the very beginning: getting the downloaded data into Hadoop
- Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing
- Explain the mapper and reducer functions
- Explain the job function
- Wrap up with showing where the data lands
- Show how to create the Shiny app. Start with creating the directory.
- This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate.
- Explain the server.R code. Note the imports of the relevant R packages.
- Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools
- Show how to kick off the Shiny app and note the listening port
- Go to the browser and view the Shiny app
- Cut to the live demo.
- Recap what we just saw and suggest possible future steps to further develop the app
- Hammer home HDP as the bedrock for the app
- Suggest getting started with the Sandbox
- Wrap up with Q & A