Hadoop World 2011 Keynote: The State of the Apache Hadoop Ecosystem


Published on

Doug Cutting, co-creator of Apache Hadoop, will discuss the current state of the Apache Hadoop open source ecosystem and its trajectory for the future. Doug will highlight trends and changes happening throughout the platform, including new additions and important improvements. He will address the implications of how different components of the stack work together to form a coherent and efficient platform. He will draw particular attention to Big Top, a project initiated by Cloudera to build a community around the packaging and interoperability testing of Hadoop-related projects with the goal of providing a consistent and interoperable framework. Cutting will also discuss the latest additions in CDH4 and the platform roadmap for CDH.

Published in: Technology
1 Comment
  • http://dbmanagement.info/Tutorials/Hadoop.htm
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop World 2011 Keynote: The State of the Apache Hadoop Ecosystem

  1. 1. The State of the Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache
  2. 2. Outline● the ecosystem ○ why we need it ○ what it is ○ why its strong ○ how it can evolve● highlights ○ current ○ next● wrap up
  3. 3. Why are we here?Hardware has improved ● exponentially for decades ● both storage and computeWe can now store and process much more! ○ yet have been slow to leverageAnalyzing more data makes us smarter. ○ Norvigs Unreasonable Effectiveness of Data
  4. 4. The Ecosystem is the System● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de-facto industry standard● No one uses the kernel alone● A collection of projects at Apache
  5. 5. Strengths of ApacheMandates diversity & transparency ○ you control your fateInsures against vendor lock-in ○ cant buy the ASFAllows competing projects ○ survival of the fittestEcosystem as loose federation ○ lets platform evolve
  6. 6. Whats new?● Apache Hadoop 0.20.205 ○ append ○ security● CDH3 ○ Mahout included ○ Avro support across components
  7. 7. Whats next?● Apache Hadoop 0.23 ○ HDFS ■ performance ■ scalability (federation) ■ availability (HA) ○ MR2● CDH4 ○ includes Hadoop 0.23 ○ BigTop-based● S4, Giraph, Crunch, Blur, ...
  8. 8. Apache BigTop (incubating)Ecosystem as a project ○ integration tests Includes: ○ compatible versions ● Hadoop ○ common packaging ● HBase ○ release is a set ● Zookeeper ● Avro ● HiveBasis for CDH ● Pig ○ like Fedora is for RHEL ● Oozie ● Flume ● MahoutCommunity driven ● ...
  9. 9. Join the communityHadoop and Big Data are still young. Hardware trends will continue.Hadoop started with just two developers. Now it has hundreds. You can be the next. What do you need?
  10. 10. Thanks!Questions?