Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop's Opportunity to Power Next-Generation Architectures


Published on

Shaun Connolly's presentation/keynote at Hadoop Summit 2012.

Published in: Technology
    Are you sure you want to  Yes  No
    Your message goes here

Hadoop's Opportunity to Power Next-Generation Architectures

  1. 1. Hadoop’s Opportunity to PowerNext-Generation ArchitecturesShaun Connolly, Hortonworks StrategyJune 13, 2012
  2. 2. How many people are lucky enoughto say that they were at the forefront of something big?
  3. 3. Transactions InteractionsObservations
  4. 4. Big Data = Transactions + Interactions + Observations BIG DATA User Generated Content Sensors / RFID / DevicesPetabytes Mobile Web Sentiment Social Interactions & Feeds User Click Stream Spatial & GPS Web logs WEB A/B testing CoordinatesTerabytes External Offer history Dynamic Pricing Demographics Affiliate Networks Business Data Feeds CRM Gigabytes Segmentation Search Marketing HD Video, Audio, Images ERP Offer details Behavioral Speech to Text Purchase detail Targeting Megabytes Purchase record Customer Touches Product/Service Logs Dynamic Funnels Payment record SMS/MMS Support Contacts Increasing Data Variety and Complexity Source: Contents of above graphic created in partnership with Teradata, Inc.
  5. 5. There is still work to be done to ensure HADOOP powers the BIG DATA WAVE
  6. 6. Many Communities Must Work As One• Be diligent stewards of the open source core• Be tireless innovators Open Source beyond the core Vendors• Provide robust data platform services & open APIs• Enable ecosystem at each End Users layer of the stack• Make platform enterprise- ready & easy to use
  7. 7. Top 10 Influencers of the Decade 1.  Google 2.  Apple 3.  Apache Software Foundation 4.  Microsoft 5.  Linux Foundation 6.  Eclipse Foundation 7.  Twitter 8.  Free Software Foundation 9.  Android Project 10. VMwareSource: SD Times,
  8. 8. Top 10 Influencers of the Decade #3Source: SD Times,
  9. 9. Diligent Stewards & Tireless InnovatorsPig AvroHive CascadingHBase AccumuloZookeeper WhirrHCatalog ChukwaAmbari SnappySqoop SparkOozie HAMA GiraphFlume OpenMPIMahout 1.0 2.0 Beyond
  10. 10. [Integrating Hadoop withexisting IT investments isvitally important.] Larry Feinsmith
  11. 11. Connecting Transactions + Interactions + Observations Audio, Retain runtime models and Video,Images historical data for ongoing 4 Business refinement & analysis Transactions Docs, Text, & Interactions XML Web Logs, Web, Mobile, CRM, Clicks ERP, SCM, … Big DataSocial, Refinery ClassicGraph, 3 Share refined data and 1 ETLFeeds runtime models processingSensors, 2Devices, RFID Store, aggregate, and transform multi-structured BusinessSpatial, data to unlock value Intelligence GPS & Analytics Retain historical data toEvents, Other unlock additional value 5 Dashboards, Reports, Visualization, …
  12. 12. Next-Generation Big Data Architecture Audio, Web, Mobile, CRM, Video,Images ERP, SCM, … Business Transactions Docs, Text, & Interactions XML Web Logs, Clicks Big DataSocial, Refinery SQL NoSQL NewSQLGraph,Feeds EDW MPP NewSQLSensors,Devices, RFID Arrows powered by BusinessSpatial, GPS ETL, data Intelligence movement, and data & Analytics integrationEvents, technologies Other Dashboards, Reports, Visualization, …
  13. 13. Data Services & Open APIs are Vital Raw hadoop data Table access Inconsistent metadata Tool specific access HCatalog Aligned metadata RESTful APIApache HCatalog: Hadoop’s centralized metadata serviceü  Provide consistent metadata and data models across toolsü  Share data as tables in and out of HDFSü  Enable flexible, thin-client access via RESTful APIs
  14. 14. Data Services & Open APIs In Action Analyze website visits by the 1 Web Log files via WebHDFS APIs 4 type of end results Website WebInteractions Logs Big Data Order Refinery DB DataCustomer DB Data Customer & Order data via Talend Process, analyze, and join data 2 3 & HCatalog for schema via Talend, Pig, & HCatalog
  15. 15. Let’s Head to the Demo Kitchen
  16. 16. Ecosystem Completes the PuzzleApplications, Business Tools, & Dev ToolsData Management & MovementInfrastructure & Systems Management
  17. 17. Solution Architectures: Make Hadoop Enterprise-Ready & Easy to UseApplications, Business Tools, & Dev ToolsData Management & MovementInfrastructure & Systems Management
  18. 18. Our Opportunity…and Our Role By the end of 2015, more than half the worlds data will be processed by Apache Hadoop.1 Be diligent stewards of the open source core2 Be tireless innovators beyond the core3 Provide robust data platform services & open APIs4 Enable the ecosystem at each layer of the stack5 Make the platform enterprise-ready & easy to use