Ganglia Overview-v2

6,047 views

Published on

Vladimir Vuksan's presentation on Ganglia at the "Not Nagios" episode of The Bay Area Large-Scale Production Engineering meetup: http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/15481164/

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • very nice note
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,047
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
0
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

Ganglia Overview-v2

  1. 1. The Ganglia Monitoring Framework <ul><li>Vladimir Vuksan, </li></ul><ul><li>June 2011 </li></ul><ul><li>http://deanspot.org/content/ganglia-references </li></ul>
  2. 2. About this talk <ul><li>Ganglia architecture </li></ul><ul><li>How to get metrics in. </li></ul><ul><li>How to get metrics out. </li></ul>
  3. 3. Getting Ganglia <ul><li>tar xzvf ganglia-3.1.7.tar.gz ./configure --with-gmetad make make install </li></ul><ul><li>Or use binary packages ie. </li></ul><ul><ul><li>Ubuntu/Debian : apt-get install ganglia-monitor gmetad </li></ul></ul><ul><ul><li>Fedora : yum install ganglia-gmond ganglia-gmetad </li></ul></ul>
  4. 4. Ganglia Architecture <ul><li>2 daemons: gmond & gmetad </li></ul><ul><li>gmond collects or receives metric data on each node </li></ul><ul><li>1 gmetad per grid. polls 1 gmond per cluster for data. </li></ul><ul><li>a node belongs to a cluster. a cluster belongs to a grid. </li></ul><ul><li>Web UI a separate item use it or lose it </li></ul>
  5. 10. Demo
  6. 11. Custom Graphs { { &quot;report_name&quot; : &quot;network_report&quot;, &quot;report_type&quot; : &quot;standard&quot;, &quot;title&quot; : &quot;Network&quot;, &quot;vertical_label&quot; : &quot;Bytes/sec&quot;, &quot;series&quot; : [ { &quot;metric&quot;: &quot;bytes_in&quot;, &quot;color&quot;: &quot;33cc33&quot;, &quot;label&quot;: &quot;In&quot;, &quot;line_width&quot;: &quot;2&quot;, &quot;type&quot;: &quot;line&quot; }, { &quot;metric&quot;: &quot;bytes_out&quot;, &quot;color&quot;: &quot;5555cc&quot;, &quot;label&quot;: &quot;Out&quot;, &quot;line_width&quot;: &quot;2&quot;, &quot;type&quot;: &quot;line&quot; } ] }
  7. 12. Replaces this: <ul><li>DEF:'a0'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_in.rrd':'sum':AVERAGE </li></ul><ul><li>LINE2:'a0'#33cc33:'In ' </li></ul><ul><li>VDEF:a0_last=a0,LAST </li></ul><ul><li>VDEF:a0_min=a0,MINIMUM </li></ul><ul><li>VDEF:a0_avg=a0,AVERAGE </li></ul><ul><li>VDEF:a0_max=a0,MAXIMUM </li></ul><ul><li>GPRINT:'a0_last':'Now:%5.1lf%s' </li></ul><ul><li>GPRINT:'a0_min':'Min:%5.1lf%s' </li></ul><ul><li>GPRINT:'a0_avg':'Avg:%5.1lf%s' </li></ul><ul><li>GPRINT:'a0_max':'Max:%5.1lf%sl' </li></ul><ul><li>DEF:'a1'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_out.rrd':'sum':AVERAGE </li></ul><ul><li>LINE2:'a1'#5555cc:'Out' </li></ul><ul><li>VDEF:a1_last=a1,LAST </li></ul><ul><li>VDEF:a1_min=a1,MINIMUM </li></ul><ul><li>VDEF:a1_avg=a1,AVERAGE </li></ul><ul><li>VDEF:a1_max=a1,MAXIMUM </li></ul><ul><li>GPRINT:'a1_last':'Now:%5.1lf%s' </li></ul><ul><li>GPRINT:'a1_min':'Min:%5.1lf%s' </li></ul><ul><li>GPRINT:'a1_avg':'Avg:%5.1lf%s' </li></ul><ul><li>GPRINT:'a1_max':'Max:%5.1lf%sl' </li></ul>
  8. 13. A quick word about RRD.
  9. 14. <ul><li>gmetad creates 1 RRD file for each metric. </li></ul><ul><li>default retention schedule is defined in gmetad.conf </li></ul>Store an avg every For This Long 15 sec 60 min 6 min 1 day 42 min 1 week 168 min 30 days 1 day 1 year
  10. 15. Default schedule fits into 12K per metric <ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_in.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_out.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_aidle.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_idle.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_nice.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_num.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_speed.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_system.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_user.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_wio.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_free.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_total.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_fifteen.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_five.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_one.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 mem_buffers.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_run.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_total.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_free.rrd </li></ul><ul><li>-rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_total.rrd </li></ul>
  11. 16. If you need more resolution adjust it in gmetad conf <ul><li>Getting started with RRD: http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html </li></ul><ul><li>More help defining RRD files: http://www.cuddletech.com/articles/rrd/ar01s02.html </li></ul><ul><li>A little more about how RRD works with Ganglia: http://vuksan.com/blog/2010/12/14/misconceptions-about-rrd-storage/ </li></ul>
  12. 17. Getting data in <ul><li>Via gmond modules, written in C or Python. </li></ul><ul><li>Via gmetric or libraries that implement the gmetric protocol. </li></ul><ul><li>Via other daemons designed to feed metrics to ganglia (e.g. sFlow) </li></ul>
  13. 18. Zero configuration. <ul><li>Just start sending new metrics. </li></ul><ul><li>gmetad will create a new RRD file for any new metric it sees. </li></ul><ul><li>The web UI will draw a basic graph for every metric. </li></ul><ul><li>You can create nice colored graphs later if you want them. </li></ul>
  14. 19. gmond module <ul><li>modules { </li></ul><ul><li>module { </li></ul><ul><li>name = &quot;net_module&quot; </li></ul><ul><li>path = &quot;modnet.so&quot; } </li></ul><ul><li>} </li></ul><ul><li>collection_group { </li></ul><ul><li>collect_every = 40 </li></ul><ul><li>time_threshold = 300 </li></ul><ul><li>metric { </li></ul><ul><li>name = &quot;bytes_out&quot; </li></ul><ul><li>value_threshold = 4096 </li></ul><ul><li>title = &quot;Bytes Sent&quot; </li></ul><ul><li>} </li></ul><ul><li>metric { </li></ul><ul><li>name = &quot;bytes_in&quot; </li></ul><ul><li>value_threshold = 4096 </li></ul><ul><li>title = &quot;Bytes Received&quot; </li></ul><ul><li>} </li></ul><ul><li>} </li></ul>
  15. 20. gmetric $ gmetric -c /etc/ganglia/gmond.conf –name=foo --value=512 --units=foos --type=uint8 --dmax=60 CLI Ruby
  16. 21. What kind of metrics can I collect? <ul><li>Load time of your home page? </li></ul><ul><li>Number of active trouble tickets? </li></ul><ul><li>LOC in your application? </li></ul><ul><li>rcov coverage statistics? </li></ul><ul><li>Execution time of your test suite? </li></ul><ul><li>Number of user logins? </li></ul><ul><li>Memory usage by a particular process? </li></ul><ul><li>Many other metric plugins available at http://github.com/ganglia </li></ul>
  17. 23. Log parsing apps Eat log files and make metrics http://vuksan.com/linux/ganglia/# Apache_Traffic_Stats <ul><li>ganglia-logtailer https://bitbucket.org/maplebed/ganglia-logtailer </li></ul><ul><li>logster https://github.com/etsy/logster </li></ul>
  18. 24. But I thought Ganglia is only useful for host metrics? <ul><li>No </li></ul><ul><li>You can create “non-existent” hosts by spoofing </li></ul>
  19. 25. Custom Metric Demo
  20. 26. Integrating Ganglia & Nagios <ul><li>$ check_ganglia_metric.py --gmetad_host=gmetad-server.example.com --metric_host=host.example.com --metric_name=cpu_idle --critical=99 </li></ul><ul><li>Status Critical, CPU Idle = 99.6 %|cpu_idle=99.6%;;99;; </li></ul><ul><li>https://github.com/mconigliaro/check_ganglia_metric </li></ul><ul><li>http://vuksan.com/blog/2011/04/19/use-your-trending-data-for-alerting/ </li></ul>
  21. 27. Getting Data Out <ul><li>Web UI </li></ul><ul><ul><li>JSON & CSV export </li></ul></ul><ul><li>gmond XML </li></ul><ul><li>gmetad XML </li></ul><ul><li>gmetad interactive </li></ul>
  22. 28. gmond <ul><li>$ telnet localhost 8649 </li></ul><ul><li><GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmond&quot;> </li></ul><ul><li><CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504343&quot; OWNER=&quot;Alex&quot;> </li></ul><ul><li><HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504325&quot; </li></ul><ul><li>TN=&quot;18&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot;> </li></ul><ul><li><METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; </li></ul><ul><li>TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> </li></ul><ul><li><EXTRA_DATA> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </li></ul><ul><li></EXTRA_DATA> </li></ul><ul><li></METRIC> </li></ul><ul><li></HOST> </li></ul><ul><li></CLUSTER> </li></ul><ul><li></GANGLIA> </li></ul>
  23. 29. gmond <ul><li>$ telnet localhost 8649 </li></ul><ul><li>< GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmond&quot;> </li></ul><ul><li>< CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504343&quot; OWNER=&quot;Alex&quot;> </li></ul><ul><li>< HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504325&quot; </li></ul><ul><li>TN=&quot;18&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot;> </li></ul><ul><li><METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; </li></ul><ul><li>TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> </li></ul><ul><li><EXTRA_DATA> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </li></ul><ul><li></EXTRA_DATA> </li></ul><ul><li></METRIC> </li></ul><ul><li></HOST> </li></ul><ul><li></CLUSTER> </li></ul><ul><li></GANGLIA> </li></ul>
  24. 30. gmetad (non-interactive) <ul><li>$ telnet localhost 8651 </li></ul><ul><li><GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmetad&quot;> </li></ul><ul><li><GRID NAME=&quot;unspecified&quot; AUTHORITY=&quot;http://localhost/~alex/ganglia/&quot; </li></ul><ul><li>LOCALTIME=&quot;1258504287&quot;> </li></ul><ul><li><CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504282&quot; OWNER=&quot;Alex&quot; </li></ul><ul><li>LATLONG=&quot;unspecified&quot; URL=&quot;unspecified&quot;> </li></ul><ul><li><HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504265&quot; </li></ul><ul><li>TN=&quot;21&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot; LOCATION=&quot;unspecified&quot; </li></ul><ul><li>GMOND_STARTED=&quot;1258477064&quot;> </li></ul><ul><li><METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; </li></ul><ul><li>TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> </li></ul><ul><li><EXTRA_DATA> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </li></ul><ul><li></EXTRA_DATA> </li></ul><ul><li></METRIC> </li></ul><ul><li></METRIC> </li></ul><ul><li></HOST> </li></ul><ul><li></CLUSTER> </li></ul><ul><li></GANGLIA_XML> </li></ul>
  25. 31. gmetad (2) <ul><li>$ telnet localhost 8651 </li></ul><ul><li>< GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmetad&quot;> </li></ul><ul><li>< GRID NAME=&quot;unspecified&quot; AUTHORITY=&quot;http://localhost/~alex/ganglia/&quot; </li></ul><ul><li>LOCALTIME=&quot;1258504287&quot;> </li></ul><ul><li>< CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504282&quot; OWNER=&quot;Alex&quot; </li></ul><ul><li>LATLONG=&quot;unspecified&quot; URL=&quot;unspecified&quot;> </li></ul><ul><li>< HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504265&quot; </li></ul><ul><li>TN=&quot;21&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot; LOCATION=&quot;unspecified&quot; </li></ul><ul><li>GMOND_STARTED=&quot;1258477064&quot;> </li></ul><ul><li><METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; </li></ul><ul><li>TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> </li></ul><ul><li><EXTRA_DATA> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> </li></ul><ul><li><EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </li></ul><ul><li></EXTRA_DATA> </li></ul><ul><li></METRIC> </li></ul><ul><li></METRIC> </li></ul><ul><li></HOST> </li></ul><ul><li></CLUSTER> </li></ul><ul><li></GANGLIA_XML> </li></ul>
  26. 32. gmetad (interactive) <ul><li>$ telnet localhost 8652 </li></ul><ul><li>Connected to localhost.Escape character is '^]' </li></ul><ul><li>./cluster_name/host_name/load_five/ </li></ul><ul><li>... receive same XML format as normal gmetad port, but limited only to the metric you request ... </li></ul><ul><li>... receive same XML format as normal gmetad port, but limited only to the metric you request ... </li></ul><ul><li>... receive same XML format as normal gmetad port, but limited only to the metric you request ... </li></ul>
  27. 33. Is it scalable?
  28. 34. Issues <ul><li>It uses reverse DNS lookups to determine hostname => may cause issues in a cloud (need to use workarounds) </li></ul><ul><li>Doesn't allow arbitrary metric hierarchy (at this time :-)) </li></ul>
  29. 35. What's next <ul><li>Support for arbitrary grouping beyond clusters using e.g. tags </li></ul><ul><li>Better Nagios integration </li></ul><ul><li>New visualization e.g. heatmaps </li></ul><ul><li>Logstash integration </li></ul>
  30. 36. Questions <ul><li>Twitter: @vvuksan </li></ul>

×