Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Hadoop 0.23


Published on

The Apache Hadoop community is gearing up for the upcoming release of Apache Hadoop 0.23 - the first major release since 0.20 in 2009. This release has major enhancements to Hadoop such as HDFS Federation for hyper-scale and a Next Generation MapReduce framework. Arun, the Apache Hadoop Release Master for 0.23, willcover the highlights of the release and talk about efforts undertaken to test, stabilize and release The talk covers some of the timelines for the release, our plans for compatibility and upgrade paths for existing users of Hadoop.

Presented at Bay Area Hadoop User Group at Yahoo on 8/25/2011.

Published in: Technology
  • Be the first to comment

Apache Hadoop 0.23

  1. 1. Apache Hadoop 0.23<br />Arun C. Murthy<br />Hortonworks Founder and Architect<br />@acmurthy<br />(@hortonworks)<br />© Hortonworks Inc. 2011<br />August 25, 2011<br />
  2. 2. Hello! I’m Arun…<br />Architect & Lead, Apache Hadoop MapReduce Development Team at Hortonworks (formerly at Yahoo!)<br />Apache Hadoop Committer and Member of PMC<br />Full-time contributor to Apache Hadoop since early 2006<br />Apache HadoopRelease Manager for hadoop-0.23<br />
  3. 3. hadoop-0.23<br />On track to be first stable, and widely deployed, release since hadoop-0.20 in 2009<br />All stable releases of Hadoop today are based on hadoop-0.20<br />Multiple folks and entities collaborating: Hortonworks, Yahoo, Cloudera, EBay etc.<br />hadoop-0.23 branch in Apache hours away!<br />© Hortonworks Inc. 2011<br />4<br />
  4. 4. Highlights<br />HDFS Federation<br /><br />NextGenerationHadoopMapReduce<br /><br />Coming soon – HDFS High Availability<br /><br />WIP:<br />© Hortonworks Inc. 2011<br />5<br />
  5. 5. More…<br />Build - Full Mavenization<br />EditLogs re-write<br /><br />HDFS Write pipeline improvements for Hbase<br />Append/flush etc.<br />Re-implementation of MapReduce Shuffle<br />30% performance gain<br />Stability using netty rather than jetty<br />Small jobs optimizations<br />…<br />© Hortonworks Inc. 2011<br />6<br />
  6. 6. Deployment goals<br />Clusters of 6,000machines<br />Each machine with 16 cores, 48G/96G RAM, 24TB/36TB disks<br />100,000+ concurrent tasks<br />10,000 concurrent jobs<br />© Hortonworks Inc. 2011<br />7<br />
  7. 7. Testing<br />Currently tested at reasonable scale - ~500 nodes incl. GridMixv3<br />Continue to improve on performance benchmarks<br />GridMixv3<br />Sort<br />Shuffle<br />HDFS<br />Scan<br />HDFS throughput<br />…<br />© Hortonworks Inc. 2011<br />8<br />
  8. 8. Timelines<br />branch-0.23 – August 2011<br />Alpha (hadoop-0.23.0) - ~October 2011<br />Production – late Q1 2012<br />YMMV! <br />© Hortonworks Inc. 2011<br />9<br />
  9. 9. Thank You.@acmurthy<br />© Hortonworks Inc. 2011<br />