Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relationship Between Nagios and Ceilometer

Monitoring Openstack –
The Relationship Between Nagios and Ceilometer
Konstantin Benz,
Researcher
@ Zurich University of Applied Sciences
benn@zhaw.ch

Introduction & Agenda
•About me
•Working as researcher @
Zurich University of Applied Sciences
•OpenStack / Cloud Computing
•Engaged in monitoring and High Availability systems
•Currently working on a Europe-wide cloud federation:
•XIFI – eXtensible Infrastructure for Future Internet
http://www.fi-xifi.eu
•17 nodes / OpenStack clouds
•Test environment for Future Internet (FI-WARE)
applications
•Infrastructure for smart cities, public healthcare, traffic
management…
•European-wide L2-connected backbone network
•Nagios as main monitoring tool of that project

Introduction & Agenda
•What are you talking about in this presentation?
• How to use Nagios to monitor an OpenStack cloud environment
• Integrate Nagios with OpenStack
•Anything else?
• Cloud monitoring requirements
• OpenStack cloud management software and Ceilometer
• Comparison between Nagios and Ceilometer:
• Technological paradigms
• Commonalities and differences
• How to integrate Nagios with Ceilometer
•Can't wait!

Cloud Monitoring Requirements
Cloud ≈ virtualization + elasticity
•Types of clouds:
• IaaS: virtual VMs and network devices, elasticity in number/size of devices
• PaaS: virtual, elastically sized platform
• SaaS: software provided by employing virtual, elastic resources
•Cloud is a collection of virtual resources provided in physical
infrastructure
•Cloud provides resources elastically

Cloud Monitoring Requirements
Why should someone use clouds?
•Cloud consumer can outsource IT infrastructure
• No fixed costs for cloud consumer
• Pay for resource utilization
• Cloud provider responsible for building and maintaining physical
infrastructure
•Cloud provider can rent out unused IT infrastructure
• Eliminate waste
• Get money back for overcapacity

Monitoring OpenStack
OpenStack
Architecture
•Open source cloud computing software
•Consists in multiple services:
• Keystone: OpenStack identity services
(authentication, authorization, accounting)
• Cinder: management of block storage volumes
• Nova: management and provision of virtual resources
(VM instances)
• Glance: management of VM images
• Swift: management of object storage
• Neutron: management of network resources (IPs,
routing, connectivity)
• Horizon: GUI dashboard for end users
• Heat: orchestration of virtualized environments
(important for providing elasticity)
• Ceilometer: monitoring of virtual resources

Things to monitor
•Operation of OpenStack itself:
• Services: Cinder, Glance, Nova, Swift ...
• Infrastructure: Hardware, Operating System where OpenStack services are running
•Operation of virtual resources provided by OpenStack:
• Resource availability: VMs, virtual network devices
• Resource utilization: VM uptime, CPU / memory usage
→ Virtual resources are commonly monitored by Ceilometer
→ Ceilometer gathers data
through the API of
OpenStack services

Why is Ceilometer not enough?
→ Ceilometer monitors virtual resources through APIs of OpenStack
components, BUT NOT operation of the OpenStack components

Comparison Nagios / Ceilometer
Nagios operational model
•Configuration:
• Check interval (and retry interval) to poll system status and update frontend GUI
• Remote execution of monitoring clients (usually Nagios plugins)
• Thresholds that result in "Okay", "Warning", "Critical" status messages which are sent back to
Nagios server (and "Unknown" if status not measurable)
Main usage:
• Effective monitoring solution for physical servers
• System administration console that allows for fast reaction in case of problems
• Strength: extensibility and customizability
• Nagios must be extended in order to monitor virtual resources inside administrated systems

Comparison Nagios / Ceilometer
Ceilometer operational model
•Configuration:
• Polling services check metrics
• OpenStack objects generate event notifications automatically
• All events and metrics collected in a database
Main usage:
• OpenStack integrated metrics collector and database
• Temporal database that can be used for rating, charging and billing of virtual resource
utilization
• Strength: fully integrated in OpenStack, collecting most important metrics and storing their
change history
• Weakness: Does not monitor physical hosts

Nagios / OpenStack Integration
Alternative 1: Ceilometer Plugin in Nagios
•Use Nagios server as frontend for Ceilometer:
• Nagios plugin that queries Ceilometer database
• Virtual resource utilization data collected by Ceilometer
• Nagios server responsible for monitoring non-virtual resources
Benefits:
• Simple and easy to implement
• No extra Nagios plugins required to monitor virtual devices that are managed within OpenStack
• Ceilometer tool can be left unchanged
Drawbacks:
• Monitoring data is stored at 2 different places: Nagios flat file and Ceilometer database

•Implementation:
• Nagios plugin on client which hosts the Ceilometer API (code sample below)
• Initialization with default values, OpenStack authentication:
#!/bin/bash
#initialization with default values
SERVICE='cpu_util'
THRESHOLD='50.0'
CRITICAL_THRESHOLD='80.0'
#get openstack token to access ceilometer-api
export OS_USERNAME="youruser"
export OS_TENANT_NAME="yourtenant"
export OS_PASSWORD="yourpassword"
export OS_AUTH_URL=http://yourkeystoneurl:35357/v2.0/

•The plugin should receive paramaters for:
• Resource to be monitored (VM)
• Service (Ceilometer metric)
• Warning threshold
• Critical threshold
while getopts ":hs:t:T:" opt
do
case $opt in
h ) printusage;;
r ) RESOURCE=${OPTARG};;
s ) SERVICE=${OPTARG};;
t ) THRESHOLD=${OPTARG};;
T ) CRITICAL_THRESHOLD=${OPTARG};;
? ) printusage;;
esac
done

•Query Nova API to get resource to monitor (VM to be monitored):
RESOURCE=$(nova list | grep $RESOURCE | tail -2 | head -1 | awk -F '|' '{print $2; end}')
RESOURCE=$(echo $RESOURCE)
•Query metric on that resource, multiple entries possible requires an iterator):
ITERATOR=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk
'END{print NR; end}')
•Initialize with return code 0 (no warning or error):
RETURNCODE=0

•Iterate through metric:
for (( C=1; C<=$ITERATOR; C++ ))
do
METER_NAME=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE |
awk -F '|' -v var="$C" '{if (NR == var) {print $2 $1; end}}')
METER_UNIT=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk
-F '|' -v var="$C" '{if (NR == var) {print $4 $1; end}}')
RESOURCE_ID=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE |
awk -F '|' -v var="$C" '{if (NR == var) {print $5 $1; end}}')
ACTUAL_VALUE=$(ceilometer sample-list -m $METER_NAME -q "resource_id=$RESOURCE" -l
1 | grep $RESOURCE_ID | head -4 | tail -1| awk -F '|' '{print $5; end}')

•Update return code if value of one metric is above a threshold:
if [ $(echo "$ACTUAL_VALUE > $THRESHOLD" | bc) -eq 1 ]
then
if (( "$RETURNCODE" < "1" ))
then
RETURNCODE=1
fi
if [ $(echo "$ACTUAL_VALUE > $CRITICAL_THRESHOLD" | bc) -eq 1 ]
then
if (( "$RETURNCODE" < "2" ))
then
RETURNCODE=2

•Output return code:
STATUS=$(echo "$METER_NAME on $RESOURCE_ID is: $ACTUAL_VALUE $METER_UNIT")
echo $STATUS
done
echo $RETURNCODE

•Plugin can be downloaded from Github:
• https://github.com/kobe6661/nagios_ceilometer_plugin.git
•Additionally:
• NRPE-Plugin: remote execution of Nagios calls to Ceilometer
• Install NRPE on Nagios Core server and server that hosts Ceilometer API
• Change nrpe.cfg to include call to VM metric

Alternative 1: Implementation
•OpenStack installed on 3 nodes:
• Management node: responsible for monitoring other OpenStack nodes
• Controller node: responsible for management and configuration of cloud resources (VMs, network)
• Compute node: provisions virtual resources

Alternative 2: Nagios OpenStack Plugins
•Nagios as a tool to monitor OpenStack services and VMs:
• Plugins to monitor health of OpenStack services
• As soon as new VMs are created, Nagios should monitor them
• Requires elastic reconfiguration of Nagios
Benefits:
• No data duplication, Nagios is the only monitoring tool required to monitor OpenStack
Drawbacks:
• Elastic reconfiguration
• Rather complex Nagios configuration

•Problem:
• Dynamic provisioning of resources (Virtual Machines)
• Dynamic configuration of hosts in Nagios Server required
PROVIDES
OpenStack
Compute
Node
Virtual Machine
OpenStack
Controller
Node
MONITORS
Nagios
Server
VM Image

•Problem:
• What happens if VM is terminated by end user?
• Nagios assumes a host failure and produces a critical warning
PROVIDES
OpenStack
Compute
Node
Virtual Machine
OpenStack
Controller
Node
MONITORS
Nagios
Server
VM Image

•Solution:
• Nova-API triggers reconfiguration of Nagios if VMs are created or terminated
PROVIDES
OpenStack
Compute
Node
Virtual Machine
OpenStack
Controller
Node
Nagios
Server
VM Image
RECONFIGURES

•Another problem:
• VMs must have Nagios plugins installed when they are created
•Solution:
• Use only VM Images that contain Nagios plugins for VM creation OR
• Use package management tools like Puppet, Chef…
PROVIDES
OpenStack
Compute
Node
Virtual Machine
OpenStack
Controller
Node
Nagios
Server
NRPE Plugins
VM Image
NRPE Plugins

•Trigger for dynamic Nagios configuration:
• Find available resources via nova-api (requires name of host and IP address)
#!/bin/bash
NUMLINES=$(nova list | wc -l)
NUMLINES=$[$NUMLINES-3]
for (( C=1; C<=$ITERATOR; C++ ))
do
VM_NAME=$(nova list | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $3 $1;end}}')
IP_ADDRESS=$(nova list | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $7 $1;end}}'
| sed 's/[a-zA-Z0-9]*[=|-]//g')

• Create a config file including VM name and IP address from a template (e. g. vm_template.cfg)
CONFIG_FILE=$(echo $VM_NAME).cfg
sed "s/<vm_name>/$VM_NAME/g" vm_template.cfg>named_template.cfg
sed "s/<ip_address>/$IP_ADDRESS/g" named_template.cfg>$CONFIG_FILE
• Set Nagios as owner of the file and move file to Nagios configuration directory
chown nagios.nagios $CONFIG_FILE
chmod 644 $CONFIG_FILE
mv $CONFIG_FILE /usr/local/nagios/etc/objects/$CONFIG_FILE

• Add config file to nagios.cfg
echo "cfg_file=/usr/local/nagios/etc/objects/$CONFIG_FILE" >> /usr/local/nagios/etc/nagios.cfg
• Restart nagios
service nagios restart

•Why restart Nagios?
• Nagios must know that a new VM is present or that an old VM has been terminated
• Reconfigure and restart Nagios (!)

• Add trigger to Nova-API:
• Nagios Event Broker module:
• Check_MK: http://mathias-kettner.de/checkmk_livestatus.html
• Reconfigure Nagios dynamically:
• Edit nagios.cfg and restart Nagios – bad idea (!!) in a cloud environment
• Autoconfiguration tools:
• NagioSQL: http://www.nagiosql.org/documentation.html

•What other ways do exist to dynamically reconfigure Nagios?
• Puppet master that triggers:
• VMs to install Nagios NRPE plugins and
• Nagios Server to update its configuration
• Same can be done with Chef, Ansible…
• Drawback:
Puppet scalability if 1‘000s of servers have to be (de-)commisioned dynamically

• Python fabric with Cuisine to trigger:
• Get list of VMs
from novaclient.client import Client
nova = Client(VERSION, USERNAME, PASSWORD, PROJECT_ID, AUTH_URL)
servers = nova.servers.list()
• Write VM list to file
file = open('servers'‚ 'w')
file.write(servers)

• Python fabric with Cuisine to trigger:
• Create fabfile.py and define which servers should be configured
from fabric.api import *
from . import vm_recipe, nagios_recipe
env.use_ssh_config = True
servers=open('servers‘)
serverlist=[str(line) for line in servers]
env.roledefs = {‘vm': serverlist,
‘nagios_server': xx.xx.xx.xx
}

• Assign recipes
@roles(„vm")
def configure_vm():
vm_recipe.ensure()
@roles(„nagios")
def configure_nagios():
nagios_recipe.ensure()

• Create vm_recipe.py and nagios_recipe.py
from fabric.api import *
import cuisine
def ensure():
if not is_installed():
puts("Installing NRPE...")
install()
else:
puts(„NRPE already installed")
def install_prerequisites():
cuisine.package_ensure(„nrpe")

Choice of Alternatives
Which option should we choose?
• Implementation advantages and drawbacks
Implementation Advantages Drawbacks
A1: Ceilometer
collects data
• Very easy solution
• Scales well
• Data duplication
• Two monitoring systems
working in parallel
A2: Shell script • No data duplication
• Easy solution
• Difficult to maintain
• Possibly insecure
• Nagios is forced to restart
A2: Puppet • Automatic VM and Nagios
configuration
• Allows for elastic
reconfiguration of Nagios
• Heavyweight
• Bad scalability for large IaaS
clusters
A2: Python fabric
& cuisine
• Lightweight
• Automatic VM and Nagios
configuration
• Allows for elastic
reconfiguration of Nagios
• Bigger configuration effort for
package management with
strong dependencies between
packages

Conclusion
What did you talk about?
•How to use Nagios to monitor an OpenStack cloud environment
• Cloud monitoring requirements:
• Elasticity, dynamic provisioning of virtual machines
•OpenStack monitoring tools Nagios and Ceilometer
• Nagios as extensible monitoring system
• Ceilometer captures data through Nova-API
•Nagios/OpenStack integration
• Alternative 1:
• Ceilometer monitors VMs with Nagios as graphical frontend
• Alternative 2:
• Nagios monitors VMs and is automatically reconfigured
•Discovered need for dynamic reloading of Nagios configuration
•Discussed advantages/drawbacks of different implementations

Questions?
Any questions?
Thanks!

The End
Konstantin Benz
benn@zhaw.ch

Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relationship Between Nagios and Ceilometer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relationship Between Nagios and Ceilometer

Similar to Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relationship Between Nagios and Ceilometer (20)

More from Nagios

More from Nagios (16)

Recently uploaded

Recently uploaded (20)

Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relationship Between Nagios and Ceilometer