SlideShare a Scribd company logo
1 of 25
Quick House Keeping Rule

• Q&A panel is available if you have any questions during the
 webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording




                                                                 Page 1
         © Hortonworks Inc. 2013
Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers


Jeff Markham
Solution Engineer

jmarkham@hortonworks.com




© Hortonworks Inc. 2013
Agenda
•   Introductions
•   Use Case Description
•   Preparation
•   Demo
•   Review
•   Q&A




                                   Page 3
         © Hortonworks Inc. 2013
Use Case Description
• Visualizing data
  • Tools vs. application development
  • Choosing the technology
      • Hortonworks Data Platform
      • RHadoop
      • Google Charts




                                        Page 4
        © Hortonworks Inc. 2013
Preparation: Install HDP

 OPERATIONAL                                 DATA                   Hortonworks
   SERVICES                                SERVICES
                                                                    Data Platform (HDP)
   Manage &
    AMBARI                        FLUME    Store, HIVE
                                           PIG
   Operate at                           Process and         HBASE   Enterprise Hadoop
     Scale                        SQOOP Access Data
     OOZIE                                 HCATALOG
                                                                    • The ONLY 100% open source
                                  WEBHDFS
                                  Distributed    MAP REDUCE           and complete distribution
  HADOOP CORE                     Storage & Processing (in 2.0)
                                   HDFS          YARN


  PLATFORM SERVICES                    Enterprise Readiness: HA,
                                       DR, Snapshots, Security, …
                                                                    • Enterprise grade, proven and
                                                                      tested at scale
                                  HORTONWORKS
                                  DATA PLATFORM (HDP)               • Ecosystem endorsed to
                                                                      ensure interoperability
   OS                 Cloud                 VM          Appliance




                                                                                                Page 5
        © Hortonworks Inc. 2013
Preparation: Install R
• Install R language

• Install appropriate packages
  – rhdfs
  – rmr2
  – googleVis
  – shiny
  – Dependencies for all above




                                 Page 6
      © Hortonworks Inc. 2013
Preparation
• rmr2
   – Functions to allow for MapReduce in R apps


• rhdfs
   – Functions allowing HDFS access in R apps


• googleVis
   – Use of Google Chart Tools in R apps


• shiny
   – Interactive web apps for R developers




                                                  Page 7
      © Hortonworks Inc. 2013
Demo Walkthrough
              Using Hadoop, R, and Google Chart Tools




© Hortonworks Inc. 2012
Visualization Use Case
• Data from CDC
                – Vital statistics publicly available data
                – 2010 US birth data file




                 S    201001     7      2        2               30105
                 2 011 06 1 123               3405 1 06 01      2 2
SAMPLE RECORD




                 0321     1006 314      2000                   2 222           22
                 2 2 2       122222 11   3 094 1        M 04 200940 39072     3941
                 083                22    2 2 22                        110 110 00
                 0000000 00    000000000 000000 000  000000000000000000011
                 101       1 111       10    1 1 1    111111         11   1 1 11




                                              source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

                                                                                                         Page 9
                    © Hortonworks Inc. 2013
Visualization Use Case
• Put data into HDFS
                     – Create input directory
                     – Put data into input directory
 CREATE HDFS DIR




                      > hadoop fs –mkdir /user/jeff/natality
PUT DATA INTO HDFS




                      > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT
                      /user/jeff/natality/




                                                                   Page 10
                         © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Specify use of RHadoop packages
           – Initialize HDFS
           – Specify data input and output location

            #!/usr/bin/env Rscript

            require('rmr2')
            require('rhdfs')
            hdfs.init()
R SCRIPT




            hdfs.data.root = 'natality'
            hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')
            hdfs.out.root = hdfs.data.root
            hdfs.out = file.path(hdfs.out.root, 'out')

             ...


                                                                               Page 11
               © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write mapper function
           – Write reducer function



            ...

            mapper = function(k, fields) {
              keyval(as.integer(substr(fields, 89, 90)),1)
            }
R SCRIPT




            reducer = function(key, vv) {
            # count values for each key
              keyval(key, sum(as.numeric(vv),na.rm=TRUE))
            }
             ...



                                                             Page 12
              © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write job function




            ...

            job = function (input, output) {
             mapreduce(input = input,
                    output = output,
R SCRIPT




                    input.format = "text",
                    map = mapper,
                    reduce = reducer,
                    combine = T)
            }...




                                               Page 13
              © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write result to HDFS output directory




            ...
R SCRIPT




            out = from.dfs(job(hdfs.data, hdfs.out))
            results.df = as.data.frame(out,stringsAsFactors=F)




                                                                 Page 14
              © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application

                – Create directory
                – Create ui.R
                – Create server.R
SHINY APP DIR




                 > mkdir ~/my-shiny-app




                                             Page 15
                   © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
              – Create ui.R


               shinyUI(pageWithSidebar(

                # Application title
                headerPanel("2010 US Births"),

                sidebarPanel(. . .),
UI.R SOURCE




                 mainPanel(
                   tabsetPanel(
                     tabPanel("Line Chart", htmlOutput("lineChart")),
                     tabPanel("Column Chart", htmlOutput("columnChart"))
                   )
                 )
               ))



                                                                           Page 16
                 © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
                  – Create server.R


                   library(googleVis)
                   library(shiny)
                   library(rmr2)
                   library(rhdfs)
SERVER.R SOURCE




                   hdfs.init()

                   hdfs.data.root = 'natality'
                   hdfs.data = file.path(hdfs.data.root, 'out')
                   df = as.data.frame(from.dfs(hdfs.data))

                    ...




                                                                  Page 17
                      © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
                  – Create server.R



                   ...
                   shinyServer(function(input, output) {

                     output$lineChart <- renderGvis({
SERVER.R SOURCE




                       gvisLineChart(df, options=list(
                         vAxis="{title:'Number of Births'}",
                         hAxis="{title:'Age of Mother'}",
                         legend="none"
                      ))
                     })
                    ...




                                                               Page 18
                      © Hortonworks Inc. 2013
Visualization Use Case
• Run Shiny application

                > shiny::runApp('~/my-shiny-app')
                Loading required package: shiny

                Welcome to googleVis version 0.4.0
RUN SHINY APP




                ...

                HADOOP_CMD=/usr/bin/hadoop

                Be sure to run hdfs.init()

                Listening on port 8100




                                                     Page 19
                  © Hortonworks Inc. 2013
Visualization Use Case
• View Shiny application




                               Page 20
     © Hortonworks Inc. 2013
Demo Live
              Using Hadoop, R, and Google Chart Tools




© Hortonworks Inc. 2012
Visualization Use Case
• Architecture recap
  –   Analyze data sets with R on Hadoop
  –   Choose RHadoop packages
  –   Visualize data with Google Chart Tools via googleVis package
  –   Render googleVis output in Shiny applications


• Architecture next steps
  – Integrate Shiny application into existing web apps
  – Create further data models with R




                                                                 Page 22
      © Hortonworks Inc. 2013
HDP: Enterprise Hadoop Distribution

 OPERATIONAL                                 DATA                   Hortonworks
   SERVICES                                SERVICES
                                                                    Data Platform (HDP)
   Manage &
    AMBARI                        FLUME    Store, HIVE
                                           PIG
   Operate at                           Process and         HBASE   Enterprise Hadoop
     Scale                        SQOOP Access Data
     OOZIE                                 HCATALOG
                                                                    • The ONLY 100% open source
                                  WEBHDFS
                                  Distributed    MAP REDUCE           and complete distribution
  HADOOP CORE                     Storage & Processing (in 2.0)
                                   HDFS          YARN


  PLATFORM SERVICES                    Enterprise Readiness: HA,
                                       DR, Snapshots, Security, …
                                                                    • Enterprise grade, proven and
                                                                      tested at scale
                                  HORTONWORKS
                                  DATA PLATFORM (HDP)               • Ecosystem endorsed to
                                                                      ensure interoperability
   OS                 Cloud                 VM          Appliance




                                                                                                Page 23
        © Hortonworks Inc. 2013
HDP Sandbox




                             Page 24
   © Hortonworks Inc. 2013
Thank You!


Jeff Markham
Solution Engineer

jmarkham@hortonworks.com




                                Page 25
      © Hortonworks Inc. 2012

More Related Content

What's hot

App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsYahoo Developer Network
 
Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...Paco Nathan
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsDataWorks Summit
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneEnkitec
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Sumeet Singh
 
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemExtend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemFei Dong
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataEnkitec
 
NYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemNYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemAL500745425
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...MapR Technologies Japan
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 

What's hot (20)

hadoop_module6
hadoop_module6hadoop_module6
hadoop_module6
 
RHadoop
RHadoopRHadoop
RHadoop
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
 
Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
Hadoop Inside
Hadoop InsideHadoop Inside
Hadoop Inside
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry Osborne
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
 
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemExtend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop Ecosystem
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
 
NYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemNYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop Echosystem
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
sigmod08
sigmod08sigmod08
sigmod08
 

Similar to Hdp r-google charttools-webinar-3-5-2013 (2)

Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultNETWAYS
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pigRavi Mutyala
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeInside Analysis
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARNDataWorks Summit
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 

Similar to Hdp r-google charttools-webinar-3-5-2013 (2) (20)

Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARN
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Hdp r-google charttools-webinar-3-5-2013 (2)

  • 1. Quick House Keeping Rule • Q&A panel is available if you have any questions during the webinar • There will be time for Q&A at the end • We will record the webinar for future viewing • All attendees will receive a copy of the slides an recording Page 1 © Hortonworks Inc. 2013
  • 2. Hadoop, R, and Google Chart Tools Data Visualization for Application Developers Jeff Markham Solution Engineer jmarkham@hortonworks.com © Hortonworks Inc. 2013
  • 3. Agenda • Introductions • Use Case Description • Preparation • Demo • Review • Q&A Page 3 © Hortonworks Inc. 2013
  • 4. Use Case Description • Visualizing data • Tools vs. application development • Choosing the technology • Hortonworks Data Platform • RHadoop • Google Charts Page 4 © Hortonworks Inc. 2013
  • 5. Preparation: Install HDP OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 5 © Hortonworks Inc. 2013
  • 6. Preparation: Install R • Install R language • Install appropriate packages – rhdfs – rmr2 – googleVis – shiny – Dependencies for all above Page 6 © Hortonworks Inc. 2013
  • 7. Preparation • rmr2 – Functions to allow for MapReduce in R apps • rhdfs – Functions allowing HDFS access in R apps • googleVis – Use of Google Chart Tools in R apps • shiny – Interactive web apps for R developers Page 7 © Hortonworks Inc. 2013
  • 8. Demo Walkthrough Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  • 9. Visualization Use Case • Data from CDC – Vital statistics publicly available data – 2010 US birth data file S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 SAMPLE RECORD 0321 1006 314 2000 2 222 22 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 22 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 10 1 1 1 111111 11 1 1 11 source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm Page 9 © Hortonworks Inc. 2013
  • 10. Visualization Use Case • Put data into HDFS – Create input directory – Put data into input directory CREATE HDFS DIR > hadoop fs –mkdir /user/jeff/natality PUT DATA INTO HDFS > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/ Page 10 © Hortonworks Inc. 2013
  • 11. Visualization Use Case • Write R script – Specify use of RHadoop packages – Initialize HDFS – Specify data input and output location #!/usr/bin/env Rscript require('rmr2') require('rhdfs') hdfs.init() R SCRIPT hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT') hdfs.out.root = hdfs.data.root hdfs.out = file.path(hdfs.out.root, 'out') ... Page 11 © Hortonworks Inc. 2013
  • 12. Visualization Use Case • Write R script – Write mapper function – Write reducer function ... mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1) } R SCRIPT reducer = function(key, vv) { # count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE)) } ... Page 12 © Hortonworks Inc. 2013
  • 13. Visualization Use Case • Write R script – Write job function ... job = function (input, output) { mapreduce(input = input, output = output, R SCRIPT input.format = "text", map = mapper, reduce = reducer, combine = T) }... Page 13 © Hortonworks Inc. 2013
  • 14. Visualization Use Case • Write R script – Write result to HDFS output directory ... R SCRIPT out = from.dfs(job(hdfs.data, hdfs.out)) results.df = as.data.frame(out,stringsAsFactors=F) Page 14 © Hortonworks Inc. 2013
  • 15. Visualization Use Case • Create Shiny application – Create directory – Create ui.R – Create server.R SHINY APP DIR > mkdir ~/my-shiny-app Page 15 © Hortonworks Inc. 2013
  • 16. Visualization Use Case • Create Shiny application – Create ui.R shinyUI(pageWithSidebar( # Application title headerPanel("2010 US Births"), sidebarPanel(. . .), UI.R SOURCE mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) ) )) Page 16 © Hortonworks Inc. 2013
  • 17. Visualization Use Case • Create Shiny application – Create server.R library(googleVis) library(shiny) library(rmr2) library(rhdfs) SERVER.R SOURCE hdfs.init() hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'out') df = as.data.frame(from.dfs(hdfs.data)) ... Page 17 © Hortonworks Inc. 2013
  • 18. Visualization Use Case • Create Shiny application – Create server.R ... shinyServer(function(input, output) { output$lineChart <- renderGvis({ SERVER.R SOURCE gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) ... Page 18 © Hortonworks Inc. 2013
  • 19. Visualization Use Case • Run Shiny application > shiny::runApp('~/my-shiny-app') Loading required package: shiny Welcome to googleVis version 0.4.0 RUN SHINY APP ... HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() Listening on port 8100 Page 19 © Hortonworks Inc. 2013
  • 20. Visualization Use Case • View Shiny application Page 20 © Hortonworks Inc. 2013
  • 21. Demo Live Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  • 22. Visualization Use Case • Architecture recap – Analyze data sets with R on Hadoop – Choose RHadoop packages – Visualize data with Google Chart Tools via googleVis package – Render googleVis output in Shiny applications • Architecture next steps – Integrate Shiny application into existing web apps – Create further data models with R Page 22 © Hortonworks Inc. 2013
  • 23. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 23 © Hortonworks Inc. 2013
  • 24. HDP Sandbox Page 24 © Hortonworks Inc. 2013
  • 25. Thank You! Jeff Markham Solution Engineer jmarkham@hortonworks.com Page 25 © Hortonworks Inc. 2012

Editor's Notes

  1. Hi, I’m Jeff Markham and I wanted to talk today about
  2. Agenda points
  3. Describe the use case and how to choose the tech
  4. Start by installing HDP
  5. Install R and dependencies
  6. Go into more detail on the R packages
  7. Walk through the demo before actually doing the demo
  8. Describe the data set
  9. Start with the very beginning: getting the downloaded data into Hadoop
  10. Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing
  11. Explain the mapper and reducer functions
  12. Explain the job function
  13. Wrap up with showing where the data lands
  14. Show how to create the Shiny app. Start with creating the directory.
  15. This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate.
  16. Explain the server.R code. Note the imports of the relevant R packages.
  17. Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools
  18. Show how to kick off the Shiny app and note the listening port
  19. Go to the browser and view the Shiny app
  20. Cut to the live demo.
  21. Recap what we just saw and suggest possible future steps to further develop the app
  22. Hammer home HDP as the bedrock for the app
  23. Suggest getting started with the Sandbox
  24. Wrap up with Q &amp; A