• Like
  • Save

Hadoop World 2011 Keynote: The State of the Apache Hadoop Ecosystem

  • 3,358 views
Uploaded on

Doug Cutting, co-creator of Apache Hadoop, will discuss the current state of the Apache Hadoop open source ecosystem and its trajectory for the future. Doug will highlight trends and changes happening …

Doug Cutting, co-creator of Apache Hadoop, will discuss the current state of the Apache Hadoop open source ecosystem and its trajectory for the future. Doug will highlight trends and changes happening throughout the platform, including new additions and important improvements. He will address the implications of how different components of the stack work together to form a coherent and efficient platform. He will draw particular attention to Big Top, a project initiated by Cloudera to build a community around the packaging and interoperability testing of Hadoop-related projects with the goal of providing a consistent and interoperable framework. Cutting will also discuss the latest additions in CDH4 and the platform roadmap for CDH.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,358
On Slideshare
0
From Embeds
0
Number of Embeds
15

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The State of the Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache
  • 2. Outline● the ecosystem ○ why we need it ○ what it is ○ why its strong ○ how it can evolve● highlights ○ current ○ next● wrap up
  • 3. Why are we here?Hardware has improved ● exponentially for decades ● both storage and computeWe can now store and process much more! ○ yet have been slow to leverageAnalyzing more data makes us smarter. ○ Norvigs Unreasonable Effectiveness of Data
  • 4. The Ecosystem is the System● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de-facto industry standard● No one uses the kernel alone● A collection of projects at Apache
  • 5. Strengths of ApacheMandates diversity & transparency ○ you control your fateInsures against vendor lock-in ○ cant buy the ASFAllows competing projects ○ survival of the fittestEcosystem as loose federation ○ lets platform evolve
  • 6. Whats new?● Apache Hadoop 0.20.205 ○ append ○ security● CDH3 ○ Mahout included ○ Avro support across components
  • 7. Whats next?● Apache Hadoop 0.23 ○ HDFS ■ performance ■ scalability (federation) ■ availability (HA) ○ MR2● CDH4 ○ includes Hadoop 0.23 ○ BigTop-based● S4, Giraph, Crunch, Blur, ...
  • 8. Apache BigTop (incubating)Ecosystem as a project ○ integration tests Includes: ○ compatible versions ● Hadoop ○ common packaging ● HBase ○ release is a set ● Zookeeper ● Avro ● HiveBasis for CDH ● Pig ○ like Fedora is for RHEL ● Oozie ● Flume ● MahoutCommunity driven ● ...
  • 9. Join the communityHadoop and Big Data are still young. Hardware trends will continue.Hadoop started with just two developers. Now it has hundreds. You can be the next. What do you need?
  • 10. Thanks!Questions?