Your SlideShare is downloading. ×
Milind Bhandarkar's CMU hack suggestions
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Milind Bhandarkar's CMU hack suggestions

1,462
views

Published on

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,462
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Hack Ideas
  • 2. Hadoop Hacks
    • Previous Hack contests
      • Hadoop fs shell with tab completion
      • Simple XML API for getting HDFS listing
      • Counting many digits of Pi
      • Hadoop log debugger
      • Acronym Detection
  • 3. Some Ideas - 1
    • HDFS:
      • Parallel put / get (fast copying in / out of HDFS)
      • Unarchiving HDFS archives
      • Fast HDFS fsck
      • block-replication policy
      • Find command
    • Map-Reduce
      • Performance analysis : Correlating Job counters with system metrics
      • AJAXy Web UI
      • Job submission as a web service
      • Splittable gzip
  • 4. Some Ideas - 2
    • Pig
      • Web UI
      • Parser-less Pig
      • UDFs in BSF scripting languages
      • Real Progress reporting
      • Random Dataset generation (with specific types, and distributions)
  • 5. Some Ideas - 3
    • % filebug job_200908100525_0814
    • and it'll collect
    • - Which task failed 4 times.
    • - Input to the failed task
    • - Counter values
    • - Count number of exception types (by line number?)
    • List of timestamps when the tasks failed
    • And file a bugzilla ticket
    • Analyze this job!
  • 6. Some Ideas - 4
    • Feedback about retention/archival policy
      • Based on namenode audit logs
    • Feedback about data layout
    • iPhone App for monitoring Hadoop Jobs
      • Bonus : With Push notifications on Job completion and failure
  • 7. Questions ??