Milind Bhandarkar's CMU hack suggestions

  • 1,450 views
Uploaded on

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,450
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Hack Ideas
  • 2. Hadoop Hacks
    • Previous Hack contests
      • Hadoop fs shell with tab completion
      • Simple XML API for getting HDFS listing
      • Counting many digits of Pi
      • Hadoop log debugger
      • Acronym Detection
  • 3. Some Ideas - 1
    • HDFS:
      • Parallel put / get (fast copying in / out of HDFS)
      • Unarchiving HDFS archives
      • Fast HDFS fsck
      • block-replication policy
      • Find command
    • Map-Reduce
      • Performance analysis : Correlating Job counters with system metrics
      • AJAXy Web UI
      • Job submission as a web service
      • Splittable gzip
  • 4. Some Ideas - 2
    • Pig
      • Web UI
      • Parser-less Pig
      • UDFs in BSF scripting languages
      • Real Progress reporting
      • Random Dataset generation (with specific types, and distributions)
  • 5. Some Ideas - 3
    • % filebug job_200908100525_0814
    • and it'll collect
    • - Which task failed 4 times.
    • - Input to the failed task
    • - Counter values
    • - Count number of exception types (by line number?)
    • List of timestamps when the tasks failed
    • And file a bugzilla ticket
    • Analyze this job!
  • 6. Some Ideas - 4
    • Feedback about retention/archival policy
      • Based on namenode audit logs
    • Feedback about data layout
    • iPhone App for monitoring Hadoop Jobs
      • Bonus : With Push notifications on Job completion and failure
  • 7. Questions ??