Your SlideShare is downloading. ×
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

3,580
views

Published on

In this session, attendees will learn how to use R in the distributed environment of Hadoop using the rmr package. Additionally, the R package googleVis will be used to show how application …

In this session, attendees will learn how to use R in the distributed environment of Hadoop using the rmr package. Additionally, the R package googleVis will be used to show how application development teams can incorporate the power of R and the power of Google Chart Tools into their applications quickly and easily. The result is a rich custom data visualization with far less coding than what would otherwise be required. The session will begin by discussing R basics and then moving to concrete examples of statistical analysis on data sets. This will be accompanied by an application development example showing custom visualization of the analysis using googleVis. The application development example will show a browser based app both kicking off the data set analysis using R as well as the visualization of the result. Visualization examples will use both googleVis as well as basic Google Chart Tools. Attendees will leave the session with a concrete example of how to incorporate R into their existing application development practices and how to use Hadoop and its ecosystem to build custom visualizations.

Published in: Education

0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,580
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Hi, I’m Jeff Markham and I wanted to talk today about
  • Agenda points
  • Describe the use case and how to choose the tech
  • Start by installing HDP
  • Install R and dependencies
  • Go into more detail on the R packages
  • Walk through the demo before actually doing the demo
  • Describe the data set
  • Start with the very beginning: getting the downloaded data into Hadoop
  • Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing
  • Explain the mapper and reducer functions
  • Explain the job function
  • Wrap up with showing where the data lands
  • Show how to create the Shiny app. Start with creating the directory.
  • This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate.
  • Explain the server.R code. Note the imports of the relevant R packages.
  • Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools
  • Show how to kick off the Shiny app and note the listening port
  • Go to the browser and view the Shiny app
  • Cut to the live demo.
  • Recap what we just saw and suggest possible future steps to further develop the app
  • Hammer home HDP as the bedrock for the app
  • Suggest getting started with the Sandbox
  • Wrap up with Q & A
  • Transcript

    • 1. Quick House Keeping Rule• Q&A panel is available if you have any questions during the webinar• There will be time for Q&A at the end• We will record the webinar for future viewing• All attendees will receive a copy of the slides an recording Page 1 © Hortonworks Inc. 2013
    • 2. Hadoop, R, and Google Chart ToolsData Visualization for Application DevelopersJeff MarkhamSolution Engineerjmarkham@hortonworks.com© Hortonworks Inc. 2013
    • 3. Agenda• Introductions• Use Case Description• Preparation• Demo• Review• Q&A Page 3 © Hortonworks Inc. 2013
    • 4. Use Case Description• Visualizing data • Tools vs. application development • Choosing the technology • Hortonworks Data Platform • RHadoop • Google Charts Page 4 © Hortonworks Inc. 2013
    • 5. Preparation: Install HDP OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 5 © Hortonworks Inc. 2013
    • 6. Preparation: Install R• Install R language• Install appropriate packages – rhdfs – rmr2 – googleVis – shiny – Dependencies for all above Page 6 © Hortonworks Inc. 2013
    • 7. Preparation• rmr2 – Functions to allow for MapReduce in R apps• rhdfs – Functions allowing HDFS access in R apps• googleVis – Use of Google Chart Tools in R apps• shiny – Interactive web apps for R developers Page 7 © Hortonworks Inc. 2013
    • 8. Demo Walkthrough Using Hadoop, R, and Google Chart Tools© Hortonworks Inc. 2012
    • 9. Visualization Use Case• Data from CDC – Vital statistics publicly available data – 2010 US birth data file S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2SAMPLE RECORD 0321 1006 314 2000 2 222 22 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 22 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 10 1 1 1 111111 11 1 1 11 source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm Page 9 © Hortonworks Inc. 2013
    • 10. Visualization Use Case• Put data into HDFS – Create input directory – Put data into input directory CREATE HDFS DIR > hadoop fs –mkdir /user/jeff/natalityPUT DATA INTO HDFS > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/ Page 10 © Hortonworks Inc. 2013
    • 11. Visualization Use Case• Write R script – Specify use of RHadoop packages – Initialize HDFS – Specify data input and output location #!/usr/bin/env Rscript require(rmr2) require(rhdfs) hdfs.init()R SCRIPT hdfs.data.root = natality hdfs.data = file.path(hdfs.data.root, VS2010NATL.DETAILUS.DAT) hdfs.out.root = hdfs.data.root hdfs.out = file.path(hdfs.out.root, out) ... Page 11 © Hortonworks Inc. 2013
    • 12. Visualization Use Case• Write R script – Write mapper function – Write reducer function ... mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1) }R SCRIPT reducer = function(key, vv) { # count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE)) } ... Page 12 © Hortonworks Inc. 2013
    • 13. Visualization Use Case• Write R script – Write job function ... job = function (input, output) { mapreduce(input = input, output = output,R SCRIPT input.format = "text", map = mapper, reduce = reducer, combine = T) }... Page 13 © Hortonworks Inc. 2013
    • 14. Visualization Use Case• Write R script – Write result to HDFS output directory ...R SCRIPT out = from.dfs(job(hdfs.data, hdfs.out)) results.df = as.data.frame(out,stringsAsFactors=F) Page 14 © Hortonworks Inc. 2013
    • 15. Visualization Use Case• Create Shiny application – Create directory – Create ui.R – Create server.RSHINY APP DIR > mkdir ~/my-shiny-app Page 15 © Hortonworks Inc. 2013
    • 16. Visualization Use Case• Create Shiny application – Create ui.R shinyUI(pageWithSidebar( # Application title headerPanel("2010 US Births"), sidebarPanel(. . .),UI.R SOURCE mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) ) )) Page 16 © Hortonworks Inc. 2013
    • 17. Visualization Use Case• Create Shiny application – Create server.R library(googleVis) library(shiny) library(rmr2) library(rhdfs)SERVER.R SOURCE hdfs.init() hdfs.data.root = natality hdfs.data = file.path(hdfs.data.root, out) df = as.data.frame(from.dfs(hdfs.data)) ... Page 17 © Hortonworks Inc. 2013
    • 18. Visualization Use Case• Create Shiny application – Create server.R ... shinyServer(function(input, output) { output$lineChart <- renderGvis({SERVER.R SOURCE gvisLineChart(df, options=list( vAxis="{title:Number of Births}", hAxis="{title:Age of Mother}", legend="none" )) }) ... Page 18 © Hortonworks Inc. 2013
    • 19. Visualization Use Case• Run Shiny application > shiny::runApp(~/my-shiny-app) Loading required package: shiny Welcome to googleVis version 0.4.0RUN SHINY APP ... HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() Listening on port 8100 Page 19 © Hortonworks Inc. 2013
    • 20. Visualization Use Case• View Shiny application Page 20 © Hortonworks Inc. 2013
    • 21. Demo Live Using Hadoop, R, and Google Chart Tools© Hortonworks Inc. 2012
    • 22. Visualization Use Case• Architecture recap – Analyze data sets with R on Hadoop – Choose RHadoop packages – Visualize data with Google Chart Tools via googleVis package – Render googleVis output in Shiny applications• Architecture next steps – Integrate Shiny application into existing web apps – Create further data models with R Page 22 © Hortonworks Inc. 2013
    • 23. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 23 © Hortonworks Inc. 2013
    • 24. HDP Sandbox Page 24 © Hortonworks Inc. 2013
    • 25. Thank You!Jeff MarkhamSolution Engineerjmarkham@hortonworks.com Page 25 © Hortonworks Inc. 2012