Convey extra information: virtual time, task execution time, etc
Keep up with the base classes change may be hard
Example: A new variable added to JobTracker
Make dependency explicit between map & reduce tasks
Mumak as a System Behavior Verifier
Mumak as a JobTracker Debugger
MAPREDUCE-995: “ JobHistory should handle cases where task completion events are generated after job completion event ”
Discovered when testing Mumak patch for 21 submission
Introduced by the MAPREDUCE-157, committed one day earlier
Manifested as JobTracker crash due to IOException
Root cause analysis
Developer made wrong assumption of the timing of events
Assumed that when a job is marked as finished, no more heartbeat events related to the job would follow
Lead to a Closable object being used after it is closed
To reproduce through benchmarking: need to inject a failed job and encounter “good” timing when an outstanding task completes after the job is marked as failed
Mumak as a JobTracker Profiling Benchmark
Memory allocation pattern similar to real JobTracker, but at much faster rate
Mumak overhead is less than 20-30%
Limitations: Cannot detect synchronization hotspots or sub-optimal IO or network operations
Findings through YourKit profiling
Wasteful String concatenations in Log.debug() statements in mapred.ResourceEstimator.getEstimatedTotalMapOutputSize
Repetitive parsing of TaskTracker names to extract hostnames
Unnecessary exceptions from counter localization due to a removed properties file (regression introduced by H-5717)
Conclusions
Mumak: A light-weight, versatile tool for MapReduce verification and debugging
Verification of overall system behavior
A debugger for JobTracker / scheduler
A micro-benchmark to stress CPU and memory allocation
Strengths:
Easy to setup and run
Faster than running real benchmark: 1 min ~~ 2 hrs on a 2000-node cluster
Realistically reproduce conditions and test actual code
Can easily generate variants of ordering of distributed events
Limitations: No simulation of system services or threads
Cannot debug synchronization problems among threads
0 comments
Post a comment