• Save
SJTU Summary report
Upcoming SlideShare
Loading in...5

Like this? Share it with your network

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Summary ReportYupeng Chen2012.8.15
  • 2. Introduce to GangliaProblem & SolutionMy Harvest
  • 3. Introduction and overview• Scalable distributed monitoring system for high-performancecomputing systems• XML - data representation• XDR(XML-Data Reduced) - compact, portable data transport• RRDTool - data storage and visualization• PHP - web frontend interface
  • 4. Ganglia Architecture• Gmond - Ganglia Monitoring DaemonMetric gathering agent installed on individual servers• Gmetad - Ganglia Meta DaemonMetric aggregation agent installed on specificservers• Apache(Nginx + php5-fpm) web frontendMetric presentation and analysis server• Model - Multicast or Unicast
  • 5. Multicast – All gmond nodes are capable of listening to andreporting on the status of the entire cluster
  • 6. Unicast - Send the localhost monitoring data to specificmachines, cross-network segment is allowed.
  • 7. VMHTest Clustermastergmondgmondslave1gmondslave2gmondslave3gmondslave4gmondzookeeperXDR / UDPOmnilab Clusterdev gmondgangliagmondomnilab gmondXDR / UDPgmetadpollXML / TCPXML / TCPpollrrdtoolweb-frontendpushpush
  • 8. Gmond – Metric Gathering Agent• Built-in metrics– Various CPU, Network I/O, Disk and Memory• Extensible– Gmetric – Out-of-process utility capable of invokingcommand line based metric gathering scripts– Loadable modules capable of gathering multiplemetrics or using advanced metric gathering APIs• Work with Hadoop & HBase– NameNode, DataNode, JobTracker, TaskTracker, etc.– JVM, rpc, etc.
  • 9. • Based on open standard• Low per-node overheads and high concurrency• High reliability and independence : failover• Data storage and presentation : RRDTool• Ported to various different platforms(Linux, FreeBSD, Solaris, others)Feature & Advantage
  • 10. Problems & Bottlenecks• Overhead evaluation of central node• CPU ( XDR XML )• network I/O• disk I/O• Gmetad RRD write bottleneck• Every metric has a corresponding a data file (*.rrd )• Write a large number of small files at the same time20 nodes,for each has 500+ metrics10000+ times read/writerequests in a few seconds
  • 11. Solutions• Distributed monitoring system• Separated clusters into small pieces• Multiple Gmetad
  • 12. • Datebase should be placed in RAM• tmpfs• RAID 0• Reduce the sampling frequencySolutions
  • 13. My Harvest• Dev - Ops• Linux• git• wiki• Cloud computing• OpenStack• Virtualization• BigData• Hadoop• HBase
  • 14. Thank youThank youThank youThank youWPS OfficeMake Presentation much more fun