Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
Ganglia Monitoring Tool
Monitoring through Ganglia
1006 views | comments | 0 favorites | 27 downloads | 0 embeds (Stats)
More Info
This slideshow is Public
Total Views: 1006 on Slideshare: 1006 from embeds: 0
Slideshow Transcript
- Slide 1: Monitoring Your Data Center
Using Apache and Ganglia
Brad Nicholes
Sr. Software Engineer, Novell
Member Apache Software Foundation
bnicholes@apache.org
- Slide 2: Agenda Ganglia Monitoring
Introduction and Overview
•
Ganglia Architecture
•
Apache Web Frontend
•
Gmond & Gmetad
•
Extending Ganglia
•
GMetrics
–
Module Development
–
© Novell Inc. All rights reserved
2
- Slide 3: Introduction and Overview
Scalable Distributed Monitoring System
•
Targeted at monitoring clusters and grids
•
Multicast-based Listen/Announce protocol
•
Depends on open standards
•
XML
–
XDR compact portable data transport
–
RRDTool - Round Robin Database
–
APR – Apache Portable Runtime
–
Apache HTTPD Server
–
PHP based web interface
–
http://ganglia.sourceforge.net or http://www.ganglia.info
•
© Novell Inc. All rights reserved
3
- Slide 4: Ganglia Architecture
Gmond – Metric gathering agent installed on individual servers
•
Gmetad – Metric aggregation agent installed on one or more
•
specific task oriented servers
Apache Web Frontend – Metric presentation and analysis server
•
Attributes
•
Multicast – All gmond nodes are capable of listening to and reporting
–
on the status of the entire cluster
Failover – Gmetad has the ability to switch which cluster node it polls
–
for metric data
Lightweight and low overhead metric gathering and transport
–
Ported to various different platforms (Linux, FreeBSD, Solaris,
•
others)
© Novell Inc. All rights reserved
4
- Slide 5: Ganglia Architecture
Apache
Web
Web
Frontend
Client
GMETAD
Poll
Poll
GMETAD
Poll Poll
Failover
Failover Failover
Cluster 1 Cluster 2 Cluster 3
GMOND GMOND GMOND
Node Node Node
GMOND GMOND GMOND GMOND GMOND GMOND
Node Node Node Node Node Node
© Novell Inc. All rights reserved
5
- Slide 6: Ganglia Web Frontend
Built around Apache HTTPD server using mod_php
•
Uses presentation templates so that the web site “look
•
and feel” can be easily customized
Presents an overview of all nodes within a grid vs all
•
nodes in a cluster
Ability to drill down into individual nodes
•
Presents both textual and graphical views
•
© Novell Inc. All rights reserved
6
- Slide 7: Ganglia Customized Web Front-end
© Novell Inc. All rights reserved
7
- Slide 8: Deploying Ganglia Monitoring
See http://ganglia.sourceforge.net/docs/ganglia.html
•
Install Gmond on all monitored nodes
•
Edit the configuration file
–
Add cluster and host information
>
Configure network upd_send_channel, udp_recv_channel, tcp_accept_channel
>
Start gmond
>
Installing Gmetad on an aggregation node
•
Edit the configuration file
–
Add data and failover sources
>
Add grid name
>
Start gmetad
>
Installing the web frontend
•
Install Apache httpd server with mod_php
–
Copy Ganglia web pages and PHP code to appropriate location
–
Add appropriate authentication configuration for access control
–
© Novell Inc. All rights reserved
8
- Slide 9: Gmond Gathering & Gmetad Aggregation
Agents
- Slide 10: Gmond – Metric Gathering Agent
Built-in metrics
•
Various CPU, Network I/O, Disk I/O and Memory
–
Extensible
•
Gmetric – Out-of-process utility capable of invoking command
–
line based metric gathering scripts
Loadable modules capable of gathering multiple metrics or
–
using advanced metric gathering APIs
Built on the Apache Portable Runtime
•
Supports Linux, FreeBSD, Solaris and more…
–
© Novell Inc. All rights reserved
10
- Slide 11: Gmond – Metric Gathering Agent
Automatic discovery of nodes
•
Adding a node does not require configuration file changes
–
Each node is configured independently
–
Each node has the ability to listen to and/or talk on the multicast
–
channel
Can be configured for unicast connections if desired
–
Heartbeat metric determines the up/down status
–
Thread pools
•
Collection threads – Capable of running specialized functions for
–
gathering metric data
Multicast listeners – Listen for metric data from other nodes in the
–
same cluster
Data export listeners – Listen for client requests for cluster metric
–
data
© Novell Inc. All rights reserved
11
- Slide 12: Gmond – Global Configuration
daemonize - When “yes”, gmond will daemonize
•
setuid - When “yes”, gmond will set its effective UID to the uid of
•
the user specified by the user attribute
debug_level - When set to zero (0), gmond will run normally.
•
Greater than zero, gmond runs in the foreground and outputs
debugging information
Mute - When “yes”, gmond will not send data
•
deaf - When “yes”, gmond will not receive data
•
host_dmax - When set to zero (0), gmond will not delete a host
•
from its list. If set to a positive number, gmond will flush a host
after it has not heard from it for N seconds
cleanup_threshold - Minimum about of time before gmond will
•
cleanup expired data
gexec - Specify whether gmond will announce the hosts
•
availability to run gexec jobs
© Novell Inc. All rights reserved
12
- Slide 13: Gmond – Cluster Configuration
name - Specifies the name of the cluster of machines
•
owner - Specifies the administrators of the cluster
•
latlong - Latitude and longitude GPS coordinates of
•
this cluster on earth
url - Additional information about the cluster
•
© Novell Inc. All rights reserved
13
- Slide 14: Gmond – Network Configuration
Udp_send_channel
•
mcast_join, mcast_if – Multicast address and interface
–
host – Unicast host
–
port – Multicast or Unicast port
–
Udp_recv_channel
•
mcast_join, mcast_if, port – Multicast address, interface and port
–
Bind – Bind a particular local address
–
family – Protocol family
–
Tcp_accept_channel
•
Bind, port, interface – Bind a particular local address, listen port and
–
interface
Family – Protocol family
–
timeout – Request timeout
–
© Novell Inc. All rights reserved
14
- Slide 15: Gmond – Configuration Example
globals { udp_send_channel {
daemonize = yes mcast_join = 239.2.11.71
setuid = yes port = 8649
user = nobody ttl = 1
debug_level = 0 }
max_udp_msg_len = 1472 udp_recv_channel {
mute = no mcast_join = 239.2.11.71
deaf = no port = 8649
host_dmax = 0 /*secs */ bind = 239.2.11.71
cleanup_threshold = 300 /*secs */ }
gexec = no tcp_accept_channel {
} port = 8649
cluster { }
name = “My Cluster\"
owner = “Administrator\"
latlong = “N37.37 W122.23\"
url = “http://www.moreinfo.org\"
}
© Novell Inc. All rights reserved
15
- Slide 16: Gmond – Access Control
Configured in upd_recv_channel or acl {
•
default = \"deny\"
tcp_accept_channel sections access {
ip = 192.168.0.4
Examples:
•
mask = 32
“Deny all” with exceptions
– action = \"allow\"
}
}
acl {
default = \"allow\"
access {
ip = 192.168.0.0
mask = 24
action = \"deny\"
“Allow all” with IPv4 & IPv6 exceptions
–
}
access {
ip = ::ff:1.2.3.0
mask = 120
action = \"deny\"
}
}
© Novell Inc. All rights reserved
16
- Slide 17: Gmond – Metric Collection Groups
Specify as many collection groups as you like
•
Each collection group must contain at least one metric section
•
List available metrics by invoking “gmond -m”
•
Collection_group section:
•
collect_once – Specifies that the group of static metrics
–
collect_every – Collection interval (only valid for non-static)
–
time_threshold – Max data send interval
–
Metric section:
•
Name – Metric name (see “gmond –m”)
–
Value_threshold – Metric variance threshold (send if exceeded)
–
© Novell Inc. All rights reserved
17
- Slide 18: Gmond – Configuration Example
collection_group { collection_group {
collect_once = yes collect_every = 20
time_threshold = 20 time_threshold = 90
metric { metric {
name = \"heartbeat\" name = \"load_one\"
} value_threshold = \"1.0\"
} }
collection_group { metric {
collect_once = yes name = \"load_five\"
time_threshold = 1200 value_threshold = \"1.0\"
metric { }
name = \"cpu_num\" …
} }
metric { collection_group {
name = \"cpu_speed\" collect_every = 80
} time_threshold = 950
metric { metric {
name = \"mem_total\" name = \"proc_run\"
} value_threshold = \"1.0\"
metric { }
name = \"swap_total\" metric {
} name = \"proc_total\"
… value_threshold = \"1.0\"
} }
}
© Novell Inc. All rights reserved
18
- Slide 19: Gmetad – Metric Aggregation Agent
Polls a designated cluster node for the status of the
•
entire cluster
Data collection thread per cluster
–
Ability to poll gmond or another gmetad for metric data
–
Failover capability
•
RRDTool – Storage and trend graphing tool
•
Defines fixed size databases that hold data of various
–
granularity
Capable of rendering trending graphs from the smallest
–
granularity to the largest (eg. Last hour vs last year)
Never grows larger than the predetermined fixed size
–
Database granularity is configurable through gmetad.conf
–
© Novell Inc. All rights reserved
19
- Slide 20: Gmetad – Configuration
Data source and and failover designations
•
data_source \"my cluster\" [polling interval] address1:port addreses2:port ...
–
RRD database storage definition
•
RRAs \"RRA:AVERAGE:0.5:1:244\" \"RRA:AVERAGE:0.5:24:244\"
–
\"RRA:AVERAGE:0.5:168:244\" \"RRA:AVERAGE:0.5:672:244\"
\"RRA:AVERAGE:0.5:5760:374\"
Access control
•
trusted_hosts address1 address2 … DN1 DN2 …
–
all_trusted OFF/on
–
RRD files location
•
rrd_rootdir \"/var/lib/ganglia/rrds\"
–
Network
•
xml_port 8651
–
interactive_port 8652
–
© Novell Inc. All rights reserved
20
- Slide 21: Gmetad – Configuration Example
data_source \"my cluster\" 10 localhost my.machine.edu:8649 1.2.3.5:8655
data_source \"my grid\" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
data_source \"another source\" 1.3.4.7:8655 1.3.4.8
trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org
xml_port 8651
interactive_port 8652
rrd_rootdir \"/var/lib/ganglia/rrds\"
© Novell Inc. All rights reserved
21
- Slide 22: Round-Robin Database Storage
- Slide 23: Round-Robin Database (RRD)
High performance data logging and graphing system for
•
time series data
Automatic data consolidation over time
•
Define various Round-Robin Archives (RRA) which hold data
–
points at decreasing levels of granularity
Multiple data points from a more granular RRA are
–
automatically consolidated and added to a courser RRA
Constant and predictable data storage size
•
Old data is eliminated as new data is added to the RRD file
–
Amount of storage required is defined at the time the RRD file
–
is created
RRDTool Web site: http://oss.oetiker.ch/rrdtool/
•
© Novell Inc. All rights reserved
23
- Slide 24: Ganglia Default RRD Definition
Definition of the Round-Robin Database format is
•
determined at database creation time
Default Ganglia RRA definitions:
•
RRA #1 – 15 second average for 61 minutes
–
RRA #2 – 6 minute average for 24.4 hours
–
RRA #3 – 42 minute average for 7.1 days
–
RRA #4 – 2.8 hour average for 28.5 days
–
RRA #5 – 24 hour average for 374 days
–
Default largest retrievable time series, ~1 year
•
Configurable to whatever you want
•
© Novell Inc. All rights reserved
24
- Slide 25: Retrieving Data, Generating Graphs
and Interacting with an RRD File
RRDFetch – Retreive time series data from an RRD file
•
for a specific time period
RRDInfo – Print header data from an RRD file in a
•
parsing friendly format
RRDGraph – Creates a graphical representation of the
•
specified time series data
RRDUpdate – Feed new data values into an RRD file
•
Other APIs – RRDCreate, RRDDump, RRDFirst,
•
RRDLast, RRDLastupdate, RRDResize, …
© Novell Inc. All rights reserved
25
- Slide 26: Extending the Ganglia Monitoring System
- Slide 27: Gmetric Service Level Metrics Utility
Extends the available metrics that can be produced
•
through Gmond
Ability to run specialized metric gathering scripts
•
Pushes metric data back through Gmond
•
Must be scheduled through cron rather than Gmond
•
Gmetric repository on Ganglia project site
•
http://ganglia.sourceforge.net/gmetric/
–
© Novell Inc. All rights reserved
27
- Slide 28: Gmetric Command Line
gmetric --conf=./custom.conf -n \"wow\" -v \"it works\" -t \"string\"
•
Usage: gmetric [OPTIONS]...
-h, --help Print help and exit
-V, --version Print version and exit
-c, --conf=STRING The configuration file to use for finding send channels
(default=`/etc/gmond.conf')
-n, --name=STRING Name of the metric
-v, --value=STRING Value of the metric
-t, --type=STRING Either
string|int8|uint8|int16|uint16|int32|uint32|float|double
-u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius
(default=`')
-s, --slope=STRING Either zero|positive|negative|both (default=`both')
-x, --tmax=INT The maximum time in seconds between gmetric calls
(default=`60')
-d, --dmax=INT The lifetime in seconds of this metric (default=`0')
© Novell Inc. All rights reserved
28
- Slide 29: Gmond Pluggable Metric Modules
Extends the available metrics that can be gathered by
•
Gmond
Provided as dynamically loadable modules
•
Configured through the gmond.conf
•
Scheduled through Gmond rather than an external
•
scheduler
Module development is similar to an Apache module
•
Able to produce multiple metrics from a single module
•
© Novell Inc. All rights reserved
29
- Slide 30: Gmond Module Development
Three callback interfaces
•
Init int (*ex_metric_init)(apr_pool_t *p);
–
Clean up void (*ex_metric_cleanup)(void);
–
Metric gathering handler g_val_t (*ex_metric_handler)(int metric_index);
–
Metric definition structure
•
mmodule example_module =
{
STD_MMODULE_STUFF, // Internal module definition
ex_metric_init, // Metric init callback function
ex_metric_cleanup, // Metric cleanup callback function
ex_metric_info, // Metric info data structure
ex_metric_handler, // Metric handler
};
© Novell Inc. All rights reserved
30
- Slide 31: Gmond Example Module
static const Ganglia_25metric ex_metric_info[]
mmodule example_module;
=
{
static int ex_metric_init(apr_pool_t *p)
{0, \"Random_Numbers\", 90,
{
GANGLIA_VALUE_UNSIGNED_INT, \"s\", both\",
apr_array_header_t *list_params =
\"%u\", UDP_HEADER_SIZE+8,
example_module.module_params_list
\"Example module metric (random numbers)\"},
srand(time(NULL)%99);
{0, \"Constant_Number\", 90,
return 0;
GANGLIA_VALUE_UNSIGNED_INT, \"Num\", \"zero\",
}
\"%hu\", UDP_HEADER_SIZE+8,
\"Example module metric(constant number)\"},
static void ex_metric_cleanup ( void )
{0, NULL}
{
};
}
mmodule example_module =
static g_val_t ex_metric_handler ( int
{
metric_index )
STD_MMODULE_STUFF,
{
ex_metric_init,
g_val_t val;
ex_metric_cleanup,
switch (metric_index) {
ex_metric_info,
case 0:
ex_metric_handler,
val.int32 = rand()%99;
};
return val;
case 1:
val.int32 = 50;
return val;
}
/* default case */
val.int32 = 0;
return val;
}
© Novell Inc. All rights reserved
31
- Slide 32: Gmond Example Module
Configuration
modules { /* Define Collection Groups */
module { collection_group {
name = \"example_module\" collect_every = 10
path = time_threshold = 50
\"/usr/lib/ganglia/modexample.so\" metric {
Param RandomMax { name = \"Random_Numbers\"
Value = 75 value_threshold = 30.0
} }
Param ConstantValue { }
Value = 25
}
} collection_group {
} collect_once = yes
time_threshold = 20
metric {
name = \"Constant_Number\"
}
}
© Novell Inc. All rights reserved
32
- Slide 33: Gmond Python Module Development
Extends the available metrics that can be gathered by
•
Gmond
Configured through the Gmond configuration file
•
Python module interface is similar to the C module
•
interface
Ability to save state within the script vs. a persistent
•
data store
Larger footprint but easier to implement new metrics
•
© Novell Inc. All rights reserved
33
- Slide 34: Gmond Python Module Development
Three mandatory functions
•
metric_init()
–
Called once at module initialization time
>
Must return a metric description dictionary or list of dictionaries
>
Any other module initialization can also take place here
>
metric_handler() – may have multiple handlers
–
Metric gathering handler
>
Must return a single data value of the same type as specified in the
>
metric_init() function
metric_cleanup()
–
Called once at module termination time
>
Does not return a value
>
© Novell Inc. All rights reserved
34
- Slide 35: Gmond Python Module Development
Metric definition data dictionary
•
d = {‘name’: ‘<your_metric_name>’,
‘call_back’: <call_back function>,
‘time_max’: int(<your_time_max>),
‘value_type’: ‘<string | uint | float | double>’,
‘units’: ’<your_units>’,
‘slope’: ‘<zero | positive | negative | both>’,
‘format’: ‘<your_format>’,
‘description’: ‘<your_description>’}
Can be a single dictionary or a list of dictionaries
•
Must be returned from the metric_init() function
•
© Novell Inc. All rights reserved
35
- Slide 36: Gmond Python Module Development
Curve_Max = 15 def curve_handler(name):
v = int(1)
global v,count,inc,Curve_Max
inc = int(1)
v += inc
count = 0
count += 1
def metric_init(params): if count > Curve_Max:
global Curve_Max
count = 0
inc = -inc
if ‘CurveMax’ in params:
Curve_Max = int(params[‘CurveMax’])
return int(v)
d = {‘name’: ‘Curve_Metric’,
‘call_back’: curve_handler,
def metric_cleanup():
‘time_max': int(60),
‘value_type’: ‘uint’, pass
‘units’: ‘Seconds’,
‘slope’: ‘both’,
‘format’: ‘%u’,
‘description’:
‘Shows a uniform curve’}
return d
© Novell Inc. All rights reserved
36
- Slide 37: Gmond Python Module Deployment
Copy the .py file to a specific directory
•
The python modules directory is defined in the gmond.conf file
–
Start Gmond using the –m paramenter
•
Shows a list of all available metrics known to Gmond
–
The python based metric should be in the list
–
Add the new python metric to a collection group just like
•
any other metric
Restart Gmond
•
© Novell Inc. All rights reserved
37
- Slide 38: Configuring Gmond for Python
Must load the mod_python.so pluggable module
•
modules {
module {
name = \"python_module\"
path = \"/usr/lib/ganglia/modpython.so\"
params = \"/usr/lib/ganglia/python_modules\"
}
}
Must specify a python module path
•
The ‘params’ directive specifies the python module path
–
Mod_python will automatically load any .py module found in
–
the specified path
Recommend including the python metric module
•
.pyconf files from within the same .conf file that loads
the python support module
Include (‘/etc/ganglia/conf.d/*.pyconf’)
–
© Novell Inc. All rights reserved
38
- Slide 39: Questions
- Slide 41: General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not
a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.
Novell, Inc. makes no representations or warranties with respect to the contents
of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular
purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole
discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any
time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this
presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party
trademarks are the property of their respective owners.