Hadoop - Now, Next and Beyond


Published on

Shaun Connolly of Horton presents at the 2012 Big Analytics Roadshow

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop - Now, Next and Beyond

  1. 1. Apache HadoopNow, Next, and BeyondShaun ConnollyVP Corporate Strategy, HortonworksApril 19, 2012© Hortonworks Inc. 2012
  2. 2. Big Data: Transactions + Interactions + Observations BIG DATA User Generated Content Sensors / RFID / DevicesPetabytes Mobile Web Social Interactions & Feeds Sentiment User Click Stream Spatial & GPS Coordinates Web logs Web A/B testing External Demographics Terabytes Offer history Dynamic Pricing Business Data Feeds Affiliate Networks HD Video, Audio, Images CRM Segmentation Gigabytes Search Marketing Offer details Speech to Text ERP Customer Touches Behavioral Targeting Product/Service Logs Purchase detail Support Contacts Megabytes Purchase record Dynamic Funnels SMS/MMS Payment record Increasing data variety and complexity Page 2 © Hortonworks Inc. 2012
  3. 3. What is Apache Hadoop?• Collection of Open Source Projects One of the best examples of – Apache Software Foundation (ASF) open source driving innovation – Loosely coupled, ship early/often and creating a market • Solution for big data – Stores petabytes of data reliably – Runs highly distributed applications – Enables a rational economics model – Powers data-driven business Page 3 © Hortonworks Inc. 2012
  4. 4. Key Hadoop Stack Components Core Components Extended Components Pig Hive Ambari & (Columnar NoSQL Store) (Data Flow) (SQL-like Access) Other Monitoring & Management HBase (Cluster Coordination) MapReduce Oozie &Zookeeper (Distributed Programing Framework) Other Workflow Scheduling HCatalog Sqoop & (Table & Schema Management) Other Ingest, ETL tools HDFS Mahout & (Hadoop Distributed File System) Other Libraries Page 4 © Hortonworks Inc. 2012
  5. 5. Hadoop Now, Next, and Beyond Apache community, including Hortonworks investing to improve Hadoop: • Make Hadoop an open, extensible, and enterprise viable platform • Enable more applications to run on Apache Hadoop “Hadoop.Beyond” Integrate w/ecosystem “Hadoop.Next” (Hadoop 0.23) HDP 2 “Hadoop.Now” Next-gen HDFS & MapReduce (Hadoop 1.0) HDP 1Most stable Hadoop ever Page 5 © Hortonworks Inc. 2012
  6. 6. Unifying Classic & Big Data Methods Classic Method Structured & Repeatable AnalysisBusiness determines what IT structures the data to questions to ask answer those questions SQL Performance and Structure “Capture only what’s needed”“Capture in case it’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis questions worth answeringanalyzing all data sources Page 6 © Hortonworks Inc. 2012
  7. 7. Unified Big Data ArchitectureEnable Developers, Data Scientists, & Information Workers Java, C/C++, Pig, JavaScript, Python, R, SAS, SQL, Excel, BI Tools, Reporting, etc. Capture, Store, Refine, Discover, Analyze, Report, Retain • Fast data loading • Path & pattern analysis • Operational analysis • ELT/ETL and refinement • Graph analysis • Transactional analysis • Image/video analysis • Text analysis • High volume ad-hoc • Online retention • Iterative discovery • Elastic data marts Batch Interactive Active Audio, Docs & Machine Coords & Social Web & Video & CRM SCM ERP Text Logs Sensors Content Mobile Images Page 7 © Hortonworks Inc. 2012
  8. 8. Hortonworks Vision We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop. Q: How to achieve that vision??? A: Ecosystem enablement around enterprise- viable open source data platform Page 8 © Hortonworks Inc. 2012
  9. 9. • 2-day event (June 13-14, 2012) in San Jose, CA• 84 breakout sessions• Showcasing real-world examples, developments and best practices of Apache Hadoop• Plus, Geoffrey Moore to keynote and more to be announced• Register now at: http://www.hadoopsummit.org Page 9
  10. 10. June 13-14, 2012San Jose, CA