Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Theius: A Streaming Visualization Suite for Hadoop Clusters

759 views

Published on

Slides for presentation of "Theius: A Streaming Visualization Suite for Hadoop Clusters", given at IC2E 2013 in San Francisco, California.

Published in: Technology
  • Be the first to comment

Theius: A Streaming Visualization Suite for Hadoop Clusters

  1. 1. Jon TedescoIC2E 2013, San Francisco, CA, USAJon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell
  2. 2.  Problem ◦ System administrators  Bottleneck for detecting & responding to failures  Communicate state of system quickly Monitoring ◦ Streaming, real-time data ◦ Ganglia  Widely used, scalable, and flexible Visualization Prediction ◦ Online prediction algorithms (real-time) Visualization Problem ◦ Ganglia  Static, time-based graphs
  3. 3. 3
  4. 4. 4
  5. 5.  Interactive ◦ Responsive and controllable Real-time ◦ Streaming, real-time, automatic Informative ◦ Direct attention to potential problems and artifacts Intuitive ◦ Demand skill, not experience Scalable ◦ Visualize large clusters without sacrificing usability 5
  6. 6.  Objectives ◦ Streaming data ◦ Configurable and interactive ◦ Informative Use cases ◦ Heterogeneous cluster ◦ Rack failure ◦ Node failure ◦ Uneven load distribution 6
  7. 7.  Architecture ◦ Simulator  Generates simulated cluster data  Streams data to clients ◦ Webpage  Asynchronous & interactive Implementation ◦ JavaScript  d3.js  jQuery ◦ Python ◦ AJAX 7
  8. 8.  Data ◦ Methodology  Data types from previous work  Heuristic values ◦ Examples  CPU, memory, context switch rate  Log events  MapReduce tasks and jobs  Failure or event prediction 8
  9. 9. 9
  10. 10. Main Visualization  Customizable using control panel  Aggregate view ◦ Summarize and drill down  Draws attention to anomalies 10
  11. 11.  Switch between main visualizations Seamless transitions ◦ Uninterrupted data stream 11
  12. 12.  Hierarchy of nodes, organized by rack Color and size configurable Scalable using summarization and drill- down Identify abnormal rack or nodes 12
  13. 13.  Hierarchy of nodes, organized by rack Color and size configurable Scalable using summarization and drill-down Identify abnormal rack or nodes 13
  14. 14.  Grouped by job Color and size configurable ◦ Example uses role for color, time remaining for size Identify abnormal jobs or tasks 14
  15. 15.  Grouped by rack Color and size configurable ◦ Example uses CPU usage and rack color coding Identify abnormal nodes or racks 15
  16. 16.  Identify trends with nodes and racks Color, size, and plots configurable Identify correlations between metrics 16
  17. 17.  Detailed data for individual node Traditional visualizations for single node 17
  18. 18. Controls  Configure metrics for visualizations  Pause and resume data stream  Legend for main visualization 18
  19. 19. Aggregate Data Aggregate data for the cluster ◦ Log events stream ◦ Global node data ◦ Summarization data 19
  20. 20. History Controls  Snapshots of historical data ◦ See main visualization and sidebar data at certain time  Visualize metric across time 20
  21. 21.  Scalable ◦ Drill-down and summarization ◦ Efficient web-based framework Intuitive, informative ◦ Topological visualization ◦ Draw attention to abnormalities Interactive, real-time ◦ Designed for streaming data ◦ Configurable visualization ◦ Pause, rewind, resume 21
  22. 22.  Experimental Setup ◦ Compare Theius to Ganglia ◦ 5 graduate students at UIUC  No prior experience with Ganglia or Theius ◦ 4 comparative tasks  Both Ganglia & Theius ◦ 6 scenarios for trends and correlations  Theius only ◦ Timings & subjective feedback 22
  23. 23. 60  Tasks 50 ◦ Scenario 1  CPU usage in single node 40 ◦ Scenario 2Seconds 30  Node with highest CPU ◦ Scenario 3 20  High memory usage 10 nodes ◦ Scenario 4 0  Aggregate cluster use Theius Ganglia 23
  24. 24.  Task 1 ◦ Identify abnormal rack in heterogeneous cluster 2.2 s Task 2 ◦ Identify rack with abnormal CPU usage 6.2 s Task 3 10.0 s ◦ Identify machine that logged the last fatal error Task 4 67.4 s ◦ Identify machine with high CPU, memory usage, or context switch rate Task 5 ◦ Identify rack with high CPU, memory usage, or context switch rate 1.2 s Task 6 7.8 s ◦ Identify correlation between context switch rate and CPU usage 24
  25. 25.  Source Code ◦ https://github.com/jtedesco/Theius Future Work ◦ User study  System administrators  Larger group  Timing as appropriate metric ◦ MapReduce-specific visualizations ◦ Scalability experiments 25
  26. 26. Jon TedescoIC2E 2013, San Francisco, CA, USAJon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell

×