• Save
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
 

Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works

on

  • 8,324 views

The Apache Hadoop community is gearing up for the upcoming release of Apache Hadoop 0.23.  This release has major enhancements to Hadoop such as HDFS Federation for hyper-scale and a Next Generation ...

The Apache Hadoop community is gearing up for the upcoming release of Apache Hadoop 0.23.  This release has major enhancements to Hadoop such as HDFS Federation for hyper-scale and a Next Generation MapReduce framework. Arun, the Apache Hadoop Release Master for 0.23, will briefly cover the highlights of the release and pay particular attention to the plans and efforts undertaken to test, stabilize and release Hadoop.next. The talk covers some of the timelines for the release, our plans for compatibility and upgrade paths for existing users of Hadoop.

Statistics

Views

Total Views
8,324
Views on SlideShare
4,332
Embed Views
3,992

Actions

Likes
9
Downloads
0
Comments
0

17 Embeds 3,992

http://xlos.tistory.com 3162
http://www.cloudera.com 560
http://d.hatena.ne.jp 149
http://it.gilbird.com 58
http://a0.twimg.com 19
http://www.hanrss.com 14
http://192.168.6.179 7
http://webcache.googleusercontent.com 5
http://dschool.co 5
http://114.111.42.144 3
http://feeds.feedburner.com 2
http://garagekidztweetz.hatenablog.com 2
http://us-w1.rockmelt.com 2
http://cloudera.louddog.net 1
http://cloudera.matt.dev 1
http://groovy.gilbird.com 1
https://www.cloudera.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works Presentation Transcript

  • Apache Hadoop 0.23What it takes and what it means…Arun C. MurthyFounder/Architect, Hortonworks@acmurthy (@hortonworks) Page 1
  • Hello! I’m Arun• Founder/Architect at Hortonworks Inc. – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MR as a service for all of Yahoo (50k nodes footprint) – Yes, I took the 3am calls! • Apache Hadoop, ASF – VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time ~6 years) – Release Manager - hadoop-0.23 Page 2
  • Releases so far…• Started for Nutch… Yahoo picked it up in early 2006, hired Doug Cutting• Initially, we did monthly releases (0.1, 0.2 …)• Quarterly after hadoop-0.15 until hadoop-0.20 in 04/2009…• hadoop-0.20 is still the basis of all current, stable, Hadoop distributions – Apache Hadoop 0.20.2xx – CDH3.* – HDP1.*• hadoop-0.20.203 (security) – 05/2011• hadoop-0.20.205 (security + append -> hbase) – 10/2011 hadoop-0.1.0 hadoop-0.10.0 hadoop-0.20.0 hadoop-0.20.205 hadoop-0.23.02006 2009 2012 Page 3
  • hadoop-0.23• First stable release off Apache Hadoop trunk in over 30 months…• Currently alpha (hadoop-0.23.0) is under voting by the Hadoop PMC• Significant major features• Several, several enhancements Page 4
  • HDFS - Federation• Significant scaling…• Separation of Namespace mgmt and Block mgmt• Suresh Srinivas (Hortonworks) – Wed 11am Page 5
  • MapReduce - YARN• NextGen Hadoop Data Processing Framework• Support MR and other paradigms• Mahadev Konar (Hortonworks) – Tue 4.30pm Node Manager Container App Mstr Client Resource Node Manager Manager Client App Mstr Container MapReduce Status Node Manager Job Submission Node Status Resource Request Container Container Page 6
  • Performance• 2x+ across the board• HDFS read/write – CRC32 – fadvise – Shortcut for local reads• MapReduce – Unlock lots of improvements from Terasort record (Owen/Arun, 2009) – Shuffle 30%+ – Small Jobs – Uber AM• Todd Lipcon (Cloudera) – Wed 10am Page 7
  • HDFS NameNode HA• The famous SPOF• https://issues.apache.org/jira/browse/HDFS-1623• Well on the way to fix in hadoop-0.23.½• Suresh Srinivas (Hortonworks), Aaron Myers (Cloudera) – Tue 2.15pm Page 8
  • More…• HDFS Write pipeline improvements for Hbase – Append/flush etc.• Build - Full Mavenization• EditLogs re-write – https://issues.apache.org/jira/browse/HDFS-1073• Tonnes more … Page 9
  • Deployment goals• Clusters of 6,000 machines – Each machine with 16+ cores, 48G/96G RAM, 24TB/36TB disks – 200+ PB (raw) per cluster – 100,000+ concurrent tasks – 10,000 concurrent jobs• Yahoo: 50,000+ machines Page 10
  • What does it take to get there?• Testing, *lots* of it• Benchmarks – At least as good as the last one• Integration testing – HBase – Pig – Hive – Oozie• Deployment discipline Page 11
  • Testing• Why is it hard? – MapReduce is, effectively, very wide api – Add Streaming – Add Pipes – Oh, Pig/Hive etc. etc.• Functional tests – Nightly – Nearly 1000 functional tests for MapReduce alone – Several hundred for Pig/Hive etc.• Scale tests – Simulation• Longevity tests• Stress tests Page 12
  • Benchmarks• Benchmark every part of the HDFS & MR pipeline – HDFS read/write throughput – NN operations – Scan, Shuffle, Sort• GridMixv3 – Run production traces in test clusters – Thousands of jobs – Stress mode v/s Replay mode Page 13
  • Integration Testing• Several projects in the ecosystem – HBase – Pig – Hive – Oozie• Cycle – Functional – Scale – Rinse, repeat Page 14
  • Deployment• Alpha/Test (early UAT) – Starting Nov, 2011 – Small scale (500-800 nodes)• Alpha – Jan, 2012 – Majority of users – 2000 nodes per cluster, > 10,000 nodes in all• Beta – Misnomer: 100s of PB, Millions of user applications – Significantly wide variety of applications and load – 4000+ nodes per cluster, > 20000 nodes in all – Late Q1, 2012• Production – Well, it’s production – Mid-to-late Q2 2012 Page 15
  • Questions? Release Candidate: http://people.apache.org/~acmurthy/hadoop-0.23.0-rc2 Release Documentation: http://people.apache.org/~acmurthy/hadoop-0.23Thank You.@acmurthy Page 16