Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Hack Ideas
Hadoop Hacks <ul><li>Previous Hack contests </li></ul><ul><ul><li>Hadoop fs shell with tab completion </li></ul></ul><ul><...
Some Ideas - 1  <ul><li>HDFS: </li></ul><ul><ul><li>Parallel put / get (fast copying in / out of HDFS) </li></ul></ul><ul>...
Some Ideas - 2 <ul><li>Pig </li></ul><ul><ul><li>Web UI </li></ul></ul><ul><ul><li>Parser-less Pig </li></ul></ul><ul><ul>...
Some Ideas - 3 <ul><li>% filebug job_200908100525_0814 </li></ul><ul><li>and it'll collect  </li></ul><ul><li>- Which task...
Some Ideas - 4 <ul><li>Feedback about retention/archival policy </li></ul><ul><ul><li>Based on namenode audit logs </li></...
Questions ??
Upcoming SlideShare
Loading in...5
×

Milind Bhandarkar's CMU hack suggestions

1,531

Published on

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,531
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Milind Bhandarkar's CMU hack suggestions

  1. 1. Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Hack Ideas
  2. 2. Hadoop Hacks <ul><li>Previous Hack contests </li></ul><ul><ul><li>Hadoop fs shell with tab completion </li></ul></ul><ul><ul><li>Simple XML API for getting HDFS listing </li></ul></ul><ul><ul><li>Counting many digits of Pi </li></ul></ul><ul><ul><li>Hadoop log debugger </li></ul></ul><ul><ul><li>Acronym Detection </li></ul></ul>
  3. 3. Some Ideas - 1 <ul><li>HDFS: </li></ul><ul><ul><li>Parallel put / get (fast copying in / out of HDFS) </li></ul></ul><ul><ul><li>Unarchiving HDFS archives </li></ul></ul><ul><ul><li>Fast HDFS fsck </li></ul></ul><ul><ul><li>block-replication policy </li></ul></ul><ul><ul><li>Find command </li></ul></ul><ul><li>Map-Reduce </li></ul><ul><ul><li>Performance analysis : Correlating Job counters with system metrics </li></ul></ul><ul><ul><li>AJAXy Web UI </li></ul></ul><ul><ul><li>Job submission as a web service </li></ul></ul><ul><ul><li>Splittable gzip </li></ul></ul>
  4. 4. Some Ideas - 2 <ul><li>Pig </li></ul><ul><ul><li>Web UI </li></ul></ul><ul><ul><li>Parser-less Pig </li></ul></ul><ul><ul><li>UDFs in BSF scripting languages </li></ul></ul><ul><ul><li>Real Progress reporting </li></ul></ul><ul><ul><li>Random Dataset generation (with specific types, and distributions) </li></ul></ul>
  5. 5. Some Ideas - 3 <ul><li>% filebug job_200908100525_0814 </li></ul><ul><li>and it'll collect </li></ul><ul><li>- Which task failed 4 times. </li></ul><ul><li>- Input to the failed task </li></ul><ul><li>- Counter values </li></ul><ul><li>- Count number of exception types (by line number?) </li></ul><ul><li>List of timestamps when the tasks failed </li></ul><ul><li>And file a bugzilla ticket </li></ul><ul><li>Analyze this job! </li></ul>
  6. 6. Some Ideas - 4 <ul><li>Feedback about retention/archival policy </li></ul><ul><ul><li>Based on namenode audit logs </li></ul></ul><ul><li>Feedback about data layout </li></ul><ul><li>iPhone App for monitoring Hadoop Jobs </li></ul><ul><ul><li>Bonus : With Push notifications on Job completion and failure </li></ul></ul>
  7. 7. Questions ??
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×