Hadoop Futures

  • 4,572 views
Uploaded on

Tom White's talk on Hadoop futures

Tom White's talk on Hadoop futures

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,572
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
164
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop Futures What to watch Tom White, Cloudera Hadoop User Group UK, Bristol 10 August 2009
  • 2. About me ▪ Apache Hadoop Committer, PMC Member, Apache Member ▪ Employed by Cloudera ▪ Author of “Hadoop: The Definitive Guide” ▪ http://hadoopbook.com
  • 3. Goals ▪ Modular ▪ E.g. pluggable block placement algorithm ▪ Multiple languages ▪ E.g. not just Java for MapReduce ▪ Integration with other systems ▪ E.g. JMX monitoring hooks
  • 4. The Project Split ▪ Core -> Common, HDFS, MapReduce ▪ New repositories ▪ New mailing lists ▪ {common,hdfs,mapreduce}-{user,dev,issues}@hadoop.apache.org ▪ New directory layouts ▪ New configuration ▪ hadoop-site.xml -> {core,hdfs,mapreduce}-site.xml ▪ More information at ▪ http://www.cloudera.com/blog/2009/07/17/the-project-split/ ▪ general@hadoop.apache.org
  • 5. Releases ▪ 0.18.3 - 29 Jan 2009 ▪ Official “stable” release ▪ Probably the most commonly used ▪ Basis for first Cloudera distribution ▪ 0.19.2 - 23 July 2009 ▪ 0.19 series is not widely used ▪ 0.20.0 - 22 April 2009 ▪ Expect large adoption with 0.20.1 release in coming weeks ▪ Basis for second Cloudera distribution, first Yahoo! distribution ▪ 0.21 series - feature freeze end of August 2009
  • 6. Hadoop 1.0 ▪ After 0.21 release ▪ Need to establish rules about version evolution ▪ Hadoop 1.0 Interface Classification - HADOOP-5073 ▪ API, Data, wire protocol compatibility - HADOOP-5071
  • 7. Interesting Projects/JIRAs ▪ Common ▪ Avro for Hadoop RPC - HADOOP-6170 ▪ Service lifecycle - HDFS-326 ▪ Distributed configuration - HADOOP-5670 ▪ 10 minute patch builds - HADOOP-5628, HDFS-458, MAPREDUCE-670 ▪ Ivy/Maven integration - HADOOP-5107 ▪ Eclipse plugin
  • 8. Interesting Projects/JIRAs (continued) ▪ MapReduce ▪ Metadata in Serialization - HADOOP-6165 ▪ Compute splits on the cluster - MAPREDUCE-207 ▪ Context Objects - ongoing migration of libraries/examples ▪ Security - HADOOP-4487 ▪ Schedulers ▪ Fair share scheduler - global scheduling, FIFO - MAPREDUCE-548, MAPREDUCE-706 ▪ Capacity - high RAM jobs - HADOOP-5884 ▪ Speed: new shuffle ▪ See http://sortbenchmark.org/Yahoo2009.pdf
  • 9. Popular JIRAs ▪ http://community.cloudera.com/
  • 10. Questions? ▪ tom@cloudera.com ▪ Cloudera’s Distribution for Hadoop ▪ http://www.cloudera.com/hadoop
  • 11. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0