Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Futures

5,107 views

Published on

Tom White's talk on Hadoop futures

Hadoop Futures

  1. 1. Hadoop Futures What to watch Tom White, Cloudera Hadoop User Group UK, Bristol 10 August 2009
  2. 2. About me ▪ Apache Hadoop Committer, PMC Member, Apache Member ▪ Employed by Cloudera ▪ Author of “Hadoop: The Definitive Guide” ▪ http://hadoopbook.com
  3. 3. Goals ▪ Modular ▪ E.g. pluggable block placement algorithm ▪ Multiple languages ▪ E.g. not just Java for MapReduce ▪ Integration with other systems ▪ E.g. JMX monitoring hooks
  4. 4. The Project Split ▪ Core -> Common, HDFS, MapReduce ▪ New repositories ▪ New mailing lists ▪ {common,hdfs,mapreduce}-{user,dev,issues}@hadoop.apache.org ▪ New directory layouts ▪ New configuration ▪ hadoop-site.xml -> {core,hdfs,mapreduce}-site.xml ▪ More information at ▪ http://www.cloudera.com/blog/2009/07/17/the-project-split/ ▪ general@hadoop.apache.org
  5. 5. Releases ▪ 0.18.3 - 29 Jan 2009 ▪ Official “stable” release ▪ Probably the most commonly used ▪ Basis for first Cloudera distribution ▪ 0.19.2 - 23 July 2009 ▪ 0.19 series is not widely used ▪ 0.20.0 - 22 April 2009 ▪ Expect large adoption with 0.20.1 release in coming weeks ▪ Basis for second Cloudera distribution, first Yahoo! distribution ▪ 0.21 series - feature freeze end of August 2009
  6. 6. Hadoop 1.0 ▪ After 0.21 release ▪ Need to establish rules about version evolution ▪ Hadoop 1.0 Interface Classification - HADOOP-5073 ▪ API, Data, wire protocol compatibility - HADOOP-5071
  7. 7. Interesting Projects/JIRAs ▪ Common ▪ Avro for Hadoop RPC - HADOOP-6170 ▪ Service lifecycle - HDFS-326 ▪ Distributed configuration - HADOOP-5670 ▪ 10 minute patch builds - HADOOP-5628, HDFS-458, MAPREDUCE-670 ▪ Ivy/Maven integration - HADOOP-5107 ▪ Eclipse plugin
  8. 8. Interesting Projects/JIRAs (continued) ▪ MapReduce ▪ Metadata in Serialization - HADOOP-6165 ▪ Compute splits on the cluster - MAPREDUCE-207 ▪ Context Objects - ongoing migration of libraries/examples ▪ Security - HADOOP-4487 ▪ Schedulers ▪ Fair share scheduler - global scheduling, FIFO - MAPREDUCE-548, MAPREDUCE-706 ▪ Capacity - high RAM jobs - HADOOP-5884 ▪ Speed: new shuffle ▪ See http://sortbenchmark.org/Yahoo2009.pdf
  9. 9. Popular JIRAs ▪ http://community.cloudera.com/
  10. 10. Questions? ▪ tom@cloudera.com ▪ Cloudera’s Distribution for Hadoop ▪ http://www.cloudera.com/hadoop
  11. 11. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0

×