Hadoop’s Opportunity to Power
Next-Generation Architectures
Shaun Connolly, Hortonworks Strategy
June 13, 2012
How many people are
    lucky enough
to say that they were
  at the forefront of
   something big?
Transactions

 Interactions

Observations
Big Data = Transactions + Interactions + Observations

                                                                                    BIG DATA              User Generated Content
                          Sensors / RFID / Devices
Petabytes                                                 Mobile Web                      Sentiment
                                                                                                        Social Interactions & Feeds
                             User Click Stream
                                                                                                                Spatial & GPS
                                   Web logs                       WEB             A/B testing                    Coordinates
Terabytes                                                                                                          External
                           Offer history                                    Dynamic Pricing
                                                                                                                 Demographics
                                                                                   Affiliate Networks
                                                                                                             Business Data Feeds
                                         CRM
 Gigabytes                                              Segmentation               Search Marketing        HD Video, Audio, Images

                               ERP                           Offer details            Behavioral               Speech to Text
                          Purchase detail                                             Targeting
 Megabytes                Purchase record          Customer Touches                                        Product/Service Logs
                                                                                   Dynamic Funnels
                          Payment record                                                                         SMS/MMS
                                                     Support Contacts


                                Increasing Data Variety and Complexity
  Source: Contents of above graphic created in partnership with Teradata, Inc.
There is still work to
 be done to ensure
     HADOOP
    powers the
  BIG DATA WAVE
Many Communities Must Work As One

• Be diligent stewards of the
  open source core

• Be tireless innovators                     Open Source
  beyond the core
                                 Vendors
• Provide robust data platform
  services & open APIs

• Enable ecosystem at each
                                           End Users
  layer of the stack

• Make platform enterprise-
  ready & easy to use
Top 10 Influencers of the Decade
     1.  Google
     2.  Apple
     3.  Apache Software Foundation
     4.  Microsoft
     5.  Linux Foundation
     6.  Eclipse Foundation
     7.  Twitter
     8.  Free Software Foundation
     9.  Android Project
     10. VMware
Source: SD Times, http://www.sdtimes.com/link/36666
Top 10 Influencers of the Decade




                   #3


Source: SD Times, http://www.sdtimes.com/link/36666
Diligent Stewards & Tireless Innovators




Pig                                Avro
Hive                               Cascading
HBase                              Accumulo
Zookeeper                          Whirr
HCatalog                           Chukwa
Ambari                             Snappy
Sqoop                              Spark
Oozie                              HAMA
                                   Giraph
Flume
                                   OpenMPI
Mahout
             1.0    2.0   Beyond
[Integrating Hadoop with
existing IT investments is
vitally important.]
                   Larry Feinsmith
Connecting Transactions + Interactions + Observations
 Audio,              Retain runtime models and
 Video,
Images
                      historical data for ongoing   4         Business
                           refinement & analysis
                                                            Transactions
 Docs,
 Text,                                                      & Interactions
 XML


  Web
 Logs,
                                                                        Web, Mobile, CRM,
 Clicks                                                                 ERP, SCM, …
                      Big Data
Social,               Refinery                                                                       Classic
Graph,
                                                    3   Share refined data and                    1     ETL
Feeds
                                                        runtime models                            processing
Sensors,     2
Devices,
  RFID
           Store, aggregate, and
           transform multi-structured                          Business
Spatial,   data to unlock value                               Intelligence
 GPS
                                                              & Analytics
                            Retain historical data to
Events,
 Other
                            unlock additional value     5
                                                                           Dashboards, Reports,
                                                                           Visualization, …
Next-Generation Big Data Architecture
 Audio,                         Web, Mobile, CRM,
 Video,
Images                          ERP, SCM, …       Business
                                                Transactions
 Docs,
 Text,                                          & Interactions
 XML


  Web
 Logs,
 Clicks
                   Big Data
Social,            Refinery                         SQL   NoSQL     NewSQL
Graph,
Feeds

                                                    EDW    MPP      NewSQL
Sensors,
Devices,
  RFID

           Arrows powered by                          Business
Spatial,
 GPS
                ETL, data                            Intelligence
           movement, and data                        & Analytics
               integration
Events,       technologies
 Other                           Dashboards, Reports,
                                 Visualization, …
Data Services & Open APIs are Vital


         Raw hadoop data                        Table access
         Inconsistent metadata
         Tool specific access
                                 HCatalog   Aligned metadata
                                                 RESTful API




Apache HCatalog: Hadoop’s centralized metadata service
ü  Provide consistent metadata and data models across tools
ü  Share data as tables in and out of HDFS
ü  Enable flexible, thin-client access via RESTful APIs
Data Services & Open APIs In Action

                                                   Analyze website visits by the
  1     Web Log files via WebHDFS APIs         4
                                                   type of end results


  Website    Web
Interactions Logs

                                    Big Data
      Order                         Refinery
               DB
      Data


Customer
               DB
  Data


        Customer & Order data via Talend           Process, analyze, and join data
 2                                             3
        & HCatalog for schema                      via Talend, Pig, & HCatalog
Let’s Head to the Demo Kitchen
Ecosystem Completes the Puzzle
Applications, Business Tools, & Dev Tools




Data Management & Movement




Infrastructure & Systems Management
Solution Architectures:
  Make Hadoop Enterprise-Ready & Easy to Use
Applications, Business Tools, & Dev Tools




Data Management & Movement




Infrastructure & Systems Management
Our Opportunity…and Our Role

            By the end of 2015,
    more than half the world's data will be
      processed by Apache Hadoop.
1   Be diligent stewards of the open source core

2   Be tireless innovators beyond the core

3   Provide robust data platform services & open APIs

4   Enable the ecosystem at each layer of the stack

5   Make the platform enterprise-ready & easy to use

Hadoop's Opportunity to Power Next-Generation Architectures

  • 1.
    Hadoop’s Opportunity toPower Next-Generation Architectures Shaun Connolly, Hortonworks Strategy June 13, 2012
  • 2.
    How many peopleare lucky enough to say that they were at the forefront of something big?
  • 3.
  • 4.
    Big Data =Transactions + Interactions + Observations BIG DATA User Generated Content Sensors / RFID / Devices Petabytes Mobile Web Sentiment Social Interactions & Feeds User Click Stream Spatial & GPS Web logs WEB A/B testing Coordinates Terabytes External Offer history Dynamic Pricing Demographics Affiliate Networks Business Data Feeds CRM Gigabytes Segmentation Search Marketing HD Video, Audio, Images ERP Offer details Behavioral Speech to Text Purchase detail Targeting Megabytes Purchase record Customer Touches Product/Service Logs Dynamic Funnels Payment record SMS/MMS Support Contacts Increasing Data Variety and Complexity Source: Contents of above graphic created in partnership with Teradata, Inc.
  • 5.
    There is stillwork to be done to ensure HADOOP powers the BIG DATA WAVE
  • 6.
    Many Communities MustWork As One • Be diligent stewards of the open source core • Be tireless innovators Open Source beyond the core Vendors • Provide robust data platform services & open APIs • Enable ecosystem at each End Users layer of the stack • Make platform enterprise- ready & easy to use
  • 7.
    Top 10 Influencersof the Decade 1.  Google 2.  Apple 3.  Apache Software Foundation 4.  Microsoft 5.  Linux Foundation 6.  Eclipse Foundation 7.  Twitter 8.  Free Software Foundation 9.  Android Project 10. VMware Source: SD Times, http://www.sdtimes.com/link/36666
  • 8.
    Top 10 Influencersof the Decade #3 Source: SD Times, http://www.sdtimes.com/link/36666
  • 9.
    Diligent Stewards &Tireless Innovators Pig Avro Hive Cascading HBase Accumulo Zookeeper Whirr HCatalog Chukwa Ambari Snappy Sqoop Spark Oozie HAMA Giraph Flume OpenMPI Mahout 1.0 2.0 Beyond
  • 10.
    [Integrating Hadoop with existingIT investments is vitally important.] Larry Feinsmith
  • 11.
    Connecting Transactions +Interactions + Observations Audio, Retain runtime models and Video, Images historical data for ongoing 4 Business refinement & analysis Transactions Docs, Text, & Interactions XML Web Logs, Web, Mobile, CRM, Clicks ERP, SCM, … Big Data Social, Refinery Classic Graph, 3 Share refined data and 1 ETL Feeds runtime models processing Sensors, 2 Devices, RFID Store, aggregate, and transform multi-structured Business Spatial, data to unlock value Intelligence GPS & Analytics Retain historical data to Events, Other unlock additional value 5 Dashboards, Reports, Visualization, …
  • 12.
    Next-Generation Big DataArchitecture Audio, Web, Mobile, CRM, Video, Images ERP, SCM, … Business Transactions Docs, Text, & Interactions XML Web Logs, Clicks Big Data Social, Refinery SQL NoSQL NewSQL Graph, Feeds EDW MPP NewSQL Sensors, Devices, RFID Arrows powered by Business Spatial, GPS ETL, data Intelligence movement, and data & Analytics integration Events, technologies Other Dashboards, Reports, Visualization, …
  • 13.
    Data Services &Open APIs are Vital Raw hadoop data Table access Inconsistent metadata Tool specific access HCatalog Aligned metadata RESTful API Apache HCatalog: Hadoop’s centralized metadata service ü  Provide consistent metadata and data models across tools ü  Share data as tables in and out of HDFS ü  Enable flexible, thin-client access via RESTful APIs
  • 14.
    Data Services &Open APIs In Action Analyze website visits by the 1 Web Log files via WebHDFS APIs 4 type of end results Website Web Interactions Logs Big Data Order Refinery DB Data Customer DB Data Customer & Order data via Talend Process, analyze, and join data 2 3 & HCatalog for schema via Talend, Pig, & HCatalog
  • 15.
    Let’s Head tothe Demo Kitchen
  • 16.
    Ecosystem Completes thePuzzle Applications, Business Tools, & Dev Tools Data Management & Movement Infrastructure & Systems Management
  • 17.
    Solution Architectures: Make Hadoop Enterprise-Ready & Easy to Use Applications, Business Tools, & Dev Tools Data Management & Movement Infrastructure & Systems Management
  • 18.
    Our Opportunity…and OurRole By the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use