• Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Hw09 Fingerpointing Sourcing Performance Issues



Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Quick mention verbally of what Hadoop is: Distributed parallel processing runtime with a master-slave architecture. Focus on limping-but-alive: performance degradations not caught by heartbeats
  • Describe x and y axes


  • 1. Jiaqi Tan, Soila Pertet, Xinghao Pan, Mike Kasick, Keith Bare, Eugene Marinelli, Rajeev Gandhi Priya Narasimhan Carnegie Mellon University
  • 2. Automated Problem Diagnosis
    • Diagnosing problems
      • Creates major headaches for administrators
      • Worsens as scale and system complexity grows
    • Goal: automate it and get proactive
      • Failure detection and prediction
      • Problem determination (or “fingerpointing”)
      • Problem visualization
    • How: Instrumentation plus statistical analysis
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 3. Challenges in Problem Analysis
    • Challenging in large-scale networked environment
      • Can have multiple failure manifestations with a single root cause
      • Can have multiple root causes for a single failure manifestation
      • Problems and/or their manifestations can “travel” among communicating components
      • A lot of information from multiple sources – what to use? what to discard?
    • Automated fingerpointing
      • Automatically discover faulty node in a distributed system
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 4. Exploration of Fingerpointing
    • Current explorations
      • Hadoop
        • Open-source implementation of Map/Reduce (Yahoo!)
      • PVFS
        • High-performance file system (Argonne National Labs)
      • Lustre
        • High-performance file system (Sun Microsystems)
    • Studied
      • Various types of problems
      • Various kinds of instrumentation
      • Various kinds of data-analysis techniques
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 5. Why?
    • Hadoop is fault-tolerant
      • Heartbeats: detect lost nodes
      • Speculative re-execution: recover work due to lost/laggard nodes
    • Hadoop’s fault-tolerance can mask performance problems
      • Nodes alive but slow
    • Target failures for our diagnosis
      • Performance degradations (slow, hangs)
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 6. Hadoop Failure Survey
    • Hadoop Issue Tracker from Jan 07 to Dec 08 https://issues.apache.org/jira
    Targeted Failures: 66% Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 7. Hadoop Mailing List Survey
    • Hadoop user’s mailing list: Oct 08 to Apr 09
    • Examined queries on optimizing programs
    • Most queries: MapReduce-specific aspects e.g. data skew, number of maps and reduces
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 8. M45 Job Performance Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
    • Failures due to bad config (e.g., missing files) detected quickly
    • However, 20-30% of failed jobs run for over 1hr before aborting
    • Early fault detection desirable
  • 9. BEFORE : Hadoop Web Console
    • Admin/user sifts through wealth of information
      • Problem is aggravated in large clusters
      • Multiple clicks to chase down a problem
    • No support for historical comparison
      • Information displayed is a snapshot in time
    • Poor localization of correlated problems
      • Progress indicators for all tasks are skewed by correlated problem
    • No clear indicators of performance problems
      • Are task slowdowns due to data skew or bugs? Unclear
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 10. AFTER : Goals, Non-Goals
    • Diagnose faulty Master/Slave node to user/admin
    • Target production environment
      • Don’t instrument Hadoop or applications additionally
      • Use Hadoop logs as-is ( white-box strategy )
      • Use OS-level metrics ( black-box strategy )
    • Work for various workloads and under workload changes
    • Support online and offline diagnosis
    • Non-goals (for now)
      • Tracing problem to offending line of code
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 11. Target Hadoop Clusters
    • 4000-processor Yahoo!’s M45 cluster
      • Production environment (managed by Yahoo!)
      • Offered to CMU as free cloud-computing resource
      • Diverse kinds of real workloads, problems in the wild
        • Massive machine-learning, language/machine-translation
      • Permission to harvest all logs and OS data each week
    • 100-node Amazon’s EC2 cluster
      • Production environment (managed by Amazon)
      • Commercial, pay-as-you-use cloud-computing resource
      • Workloads under our control, problems injected by us
        • gridmix, nutch, pig, sort, randwriter
      • Can harvest logs and OS data of only our workloads
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 12. Performance Problems Studied Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University Studied Hadoop Issue Tracker (JIRA) from Jan-Dec 2007 Fault Description Resource contention CPU hog External process uses 70% of CPU Packet-loss 5% or 50% of incoming packets dropped Disk hog 20GB file repeatedly written to Disk full Disk full Application bugs Source: Hadoop JIRA HADOOP-1036 Maps hang due to unhandled exception HADOOP-1152 Reduces fail while copying map output HADOOP-2080 Reduces fail due to incorrect checksum HADOOP-2051 Jobs hang due to unhandled exception HADOOP-1255 Infinite loop at Nameode
  • 13. Hadoop: Instrumentation Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University JobTracker NameNode TaskTracker DataNode Map/Reduce tasks HDFS blocks MASTER NODE SLAVE NODES Hadoop logs OS data OS data Hadoop logs
  • 14. How About Those Metrics?
    • White-box metrics (from Hadoop logs)
      • Event-driven (based on Hadoop’s activities)
      • Durations
        • Map-task durations, Reduce-task durations, ReduceCopy-durations, etc.
      • System-wide dependencies between tasks and data blocks
      • Heartbeat information: Heartbeat rates, Heartbeat-timestamp skew between the Master and Slave nodes
    • Black-box metrics (from OS /proc)
      • 64 different time-driven metrics (sampled every second)
      • Memory used, context-switch rate, User-CPU usage, System-CPU usage, I/O wait time, run-queue size, number of bytes transmitted, number of bytes received, pages in, pages out, page faults
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 15. Intuition for Diagnosis
    • Slave nodes are doing approximately similar things for a given job
    • Gather metrics and extract statistics
      • Determine metrics of relevance
      • For both black-box and white-box data
    • Peer-compare histograms, means, etc. to determine “odd-man out”
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 16. Log-Analysis Approach
    • S ALSA: A nalyzing L ogs as S t A te Machines [ USENIX WASL 2008 ]
    • Extract state-machine views of execution from Hadoop logs
      • Distributed control-flow view of logs
      • Distributed data-flow view of logs
    • Diagnose failures based on statistics of these extracted views
      • Control-flow based diagnosis
      • Control-flow + data-flow based diagnosis
    • Perform analysis incrementally so that we can support it online
    Carnegie Mellon University Priya Narasimhan © Oct 25, 2009
  • 17. Applying SALSA to Hadoop Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University [ t] Launch Map task : [t] Copy Map outputs : [t] Map task done Map outputs to Reduce tasks on other nodes Data-flow view: transfer of data to other nodes
    • [ t] Launch Reduce task
    • :
    • [t] Reduce is idling, waiting for Map outputs
    • :
    • [t] Repeat until all Map outputs copied
      • [t] Start Reduce Copy
      • (of completed Map output)
      • :
      • [t] Finish Reduce Copy
    • [t] Reduce Merge Copy
    Incoming Map outputs for this Reduce task Control-flow view: state orders, durations
  • 18. Distributed Control+Data Flow
    • Distributed control-flow
      • Causal flow of task execution across cluster nodes, i.e., Reduces waiting on Maps via Shuffles
    • Distributed data-flow
      • Data paths of Map outputs shuffled to Reduces
      • HDFS data blocks read into and written out of jobs
    • Job-centric data-flows : Fused Control+Data Flows
      • Correlate paths of data and execution
      • Create conjoined causal paths from data source before, to data destination after, processing
    <your name here> © Oct 25, 2009 http://www.pdl.cmu.edu/
  • 19. Intuition: Peer Similarity Oct 25, 2009 Carnegie Mellon University
    • In fault-free conditions, metrics (e.g., WriteBlock durations) are similar across nodes
    • Faulty node: Same metric is different on faulty node, as compared to non-faulty nodes
    • Kullback-Leibler divergence (comparison of histograms)
    Faulty node Normalized counts (total 1.0) Histograms (distributions) of durations of WriteBlock over a 30-second window Normal node Normal node Normalized counts (total 1.0) Normalized counts (total 1.0)
  • 20. What Else Do We Do?
    • Analyze black-box data with similar intuition
      • Derive PDFs and use a clustering approach
        • Distinct behavior profiles of metric correlations
      • Compare them across nodes
      • Technique called Ganesha [ HotMetrics 2009 ]
    • Analyze heartbeat traffic
      • Compare heartbeat durations across nodes
      • Compare heartbeat-timestamp skews across nodes
    • Different metrics, different viewpoints, different algorithms
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 21. Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 22. Putting the Elephant Together Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University TaskTracker heartbeat timestamps Black-box resource usage JobTracker Durations views TaskTracker Durations views JobTracker heartbeat timestamps Job-centric data flows BliMEy: Bli nd Me n and the E lephant Framework [ CMU-CS-09-135 ]
  • 23. Visualization
    • To uncover Hadoop’s execution in an insightful way
    • To reveal outcome of diagnosis on sight
    • To allow developers/admins to get a handle as the system scales
    • Value to programmers [ HotCloud 2009 ]
      • Allows them to spot issues that might assist them in restructuring their code
      • Allows them to spot faulty nodes
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 24. Visualization ( timeseries ) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University DiskHog on slave node visible through lower heartbeat rate for that node
  • 25. Visualization( heatmaps ) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University CPU Hog on node 1 visible on Map-task durations
  • 26. Visualizations ( swimlanes ) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University Long-tailed map Delaying overall job completion time
  • 27. MIROS
    • MIROS (Map Inputs, Reduce Outputs, Shuffles)
    • Aggregates data volumes across:
      • All tasks per node
      • Entire job
    • Shows skewed data flows, bottlenecks
    Jiaqi Tan © July 09 http://www.pdl.cmu.edu/
  • 28. Current Developments
    • State-machine extraction + visualization being implemented for the Hadoop Chukwa project
      • Collaboration with Yahoo!
    • Web-based visualization widgets for HICC (Hadoop Infrastructure Care Center)
    • “ Swimlanes” currently available in Chukwa trunk (CHUKWA-94)
    <your name here> © Oct 25, 2009 http://www.pdl.cmu.edu/
  • 29. Briefly: Online Fingerpointing
    • ASDF : A utomated S ystem for D iagnosing F ailures
      • Can incorporate any number of different data sources
      • Can use any number of analysis techniques to process this data
    • Can support online or offline analyses for Hadoop
    • Currently plugging in our white-box & black-box algorithms
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 30. Hard Problems
    • Understanding the limits of black-box fingerpointing
      • What failures are outside the reach of a black-box approach?
      • What are the limits of “peer” comparison?
      • What other kinds of black-box instrumentation exist?
    • Scalability
      • Scaling to run across large systems and understanding “growing pains”
    • Visualization
      • Helping system administrators visualize problem diagnosis
    • Trade-offs
      • More instrumentation and more frequent data can improve accuracy of diagnosis, but at what performance cost?
    • Virtualized environments
      • Do these environments help/hurt problem diagnosis?
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 31. Summary
    • Automated problem diagnosis
    • Current targets: Hadoop, PVFS, Lustre
    • Initial set of failures
      • Real-world bug databases, problems in the wild
    • Short-term: Transition techniques into Hadoop code-base working with Yahoo!
    • Long-term
      • Scalability, scalability, scalability, ….
      • Expand fault study
      • Improve visualization, working with users
    • Additional details
      • USENIX WASL 2008 (white-box log analysis)
      • USENIX HotCloud 2009 (visualization)
      • USENIX HotMetrics 2009 (black-box metric analysis)
      • HotDep 2009 (black-box analysis for PVFS)
    Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 32. priya@cs.cmu.edu Oct 25, 2009 Carnegie Mellon University