Milind Bhandarkar's CMU hack suggestions
Upcoming SlideShare
Loading in...5
×
 

Milind Bhandarkar's CMU hack suggestions

on

  • 2,347 views

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Statistics

Views

Total Views
2,347
Views on SlideShare
2,341
Embed Views
6

Actions

Likes
0
Downloads
8
Comments
0

1 Embed 6

http://www.slideshare.net 6

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Milind Bhandarkar's CMU hack suggestions Milind Bhandarkar's CMU hack suggestions Presentation Transcript

  • Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Hack Ideas
  • Hadoop Hacks
    • Previous Hack contests
      • Hadoop fs shell with tab completion
      • Simple XML API for getting HDFS listing
      • Counting many digits of Pi
      • Hadoop log debugger
      • Acronym Detection
  • Some Ideas - 1
    • HDFS:
      • Parallel put / get (fast copying in / out of HDFS)
      • Unarchiving HDFS archives
      • Fast HDFS fsck
      • block-replication policy
      • Find command
    • Map-Reduce
      • Performance analysis : Correlating Job counters with system metrics
      • AJAXy Web UI
      • Job submission as a web service
      • Splittable gzip
    View slide
  • Some Ideas - 2
    • Pig
      • Web UI
      • Parser-less Pig
      • UDFs in BSF scripting languages
      • Real Progress reporting
      • Random Dataset generation (with specific types, and distributions)
    View slide
  • Some Ideas - 3
    • % filebug job_200908100525_0814
    • and it'll collect
    • - Which task failed 4 times.
    • - Input to the failed task
    • - Counter values
    • - Count number of exception types (by line number?)
    • List of timestamps when the tasks failed
    • And file a bugzilla ticket
    • Analyze this job!
  • Some Ideas - 4
    • Feedback about retention/archival policy
      • Based on namenode audit logs
    • Feedback about data layout
    • iPhone App for monitoring Hadoop Jobs
      • Bonus : With Push notifications on Job completion and failure
  • Questions ??