Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SJTU Summary report


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

SJTU Summary report

  1. 1. Summary ReportYupeng Chen2012.8.15
  2. 2. Introduce to GangliaProblem & SolutionMy Harvest
  3. 3. Introduction and overview• Scalable distributed monitoring system for high-performancecomputing systems• XML - data representation• XDR(XML-Data Reduced) - compact, portable data transport• RRDTool - data storage and visualization• PHP - web frontend interface
  4. 4. Ganglia Architecture• Gmond - Ganglia Monitoring DaemonMetric gathering agent installed on individual servers• Gmetad - Ganglia Meta DaemonMetric aggregation agent installed on specificservers• Apache(Nginx + php5-fpm) web frontendMetric presentation and analysis server• Model - Multicast or Unicast
  5. 5. Multicast – All gmond nodes are capable of listening to andreporting on the status of the entire cluster
  6. 6. Unicast - Send the localhost monitoring data to specificmachines, cross-network segment is allowed.
  7. 7. VMHTest Clustermastergmondgmondslave1gmondslave2gmondslave3gmondslave4gmondzookeeperXDR / UDPOmnilab Clusterdev gmondgangliagmondomnilab gmondXDR / UDPgmetadpollXML / TCPXML / TCPpollrrdtoolweb-frontendpushpush
  8. 8. Gmond – Metric Gathering Agent• Built-in metrics– Various CPU, Network I/O, Disk and Memory• Extensible– Gmetric – Out-of-process utility capable of invokingcommand line based metric gathering scripts– Loadable modules capable of gathering multiplemetrics or using advanced metric gathering APIs• Work with Hadoop & HBase– NameNode, DataNode, JobTracker, TaskTracker, etc.– JVM, rpc, etc.
  9. 9. • Based on open standard• Low per-node overheads and high concurrency• High reliability and independence : failover• Data storage and presentation : RRDTool• Ported to various different platforms(Linux, FreeBSD, Solaris, others)Feature & Advantage
  10. 10. Problems & Bottlenecks• Overhead evaluation of central node• CPU ( XDR XML )• network I/O• disk I/O• Gmetad RRD write bottleneck• Every metric has a corresponding a data file (*.rrd )• Write a large number of small files at the same time20 nodes,for each has 500+ metrics10000+ times read/writerequests in a few seconds
  11. 11. Solutions• Distributed monitoring system• Separated clusters into small pieces• Multiple Gmetad
  12. 12. • Datebase should be placed in RAM• tmpfs• RAID 0• Reduce the sampling frequencySolutions
  13. 13. My Harvest• Dev - Ops• Linux• git• wiki• Cloud computing• OpenStack• Virtualization• BigData• Hadoop• HBase
  14. 14. Thank youThank youThank youThank youWPS OfficeMake Presentation much more fun