Milind Bhandarkar's CMU hack suggestions
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Milind Bhandarkar's CMU hack suggestions

on

  • 2,356 views

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Milind offered several suggested hacks at the 2009 Yahoo! Hack U at CMU.

Statistics

Views

Total Views
2,356
Views on SlideShare
2,350
Embed Views
6

Actions

Likes
0
Downloads
8
Comments
0

1 Embed 6

http://www.slideshare.net 6

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Milind Bhandarkar's CMU hack suggestions Presentation Transcript

  • 1. Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Hack Ideas
  • 2. Hadoop Hacks
    • Previous Hack contests
      • Hadoop fs shell with tab completion
      • Simple XML API for getting HDFS listing
      • Counting many digits of Pi
      • Hadoop log debugger
      • Acronym Detection
  • 3. Some Ideas - 1
    • HDFS:
      • Parallel put / get (fast copying in / out of HDFS)
      • Unarchiving HDFS archives
      • Fast HDFS fsck
      • block-replication policy
      • Find command
    • Map-Reduce
      • Performance analysis : Correlating Job counters with system metrics
      • AJAXy Web UI
      • Job submission as a web service
      • Splittable gzip
  • 4. Some Ideas - 2
    • Pig
      • Web UI
      • Parser-less Pig
      • UDFs in BSF scripting languages
      • Real Progress reporting
      • Random Dataset generation (with specific types, and distributions)
  • 5. Some Ideas - 3
    • % filebug job_200908100525_0814
    • and it'll collect
    • - Which task failed 4 times.
    • - Input to the failed task
    • - Counter values
    • - Count number of exception types (by line number?)
    • List of timestamps when the tasks failed
    • And file a bugzilla ticket
    • Analyze this job!
  • 6. Some Ideas - 4
    • Feedback about retention/archival policy
      • Based on namenode audit logs
    • Feedback about data layout
    • iPhone App for monitoring Hadoop Jobs
      • Bonus : With Push notifications on Job completion and failure
  • 7. Questions ??