Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSE AND
SENSU-BILITY
Painless Metrics And Monitoring
In The Cloud with Sensu
Bethany Erskine
Velocity NYC 2013
http://github.com/skymob/sensu-tutorial

Monday, October 14, 13

BEFORE I BEGIN...
IF YOU DID NOT SET UP SENSU-TUTORIAL
BEFORE THE CLASS:
1. grab a USB key
2. follow the instructions on the README
If you don’t have a computer, no sweat!


DO YOU LOVE
YOUR
MONITORING
SETUP?

#MONITORINGLOVE


MY STORY

+

(╯︵╰,)


WHY SENSU
✓Ruby
Plugins can be written in any
✓language
✓
✓community

sensu-chef cookbook


WHY SENSU
✓re-use Nagios checks!
metrics and checks all collected by
✓one system
✓
✓easy to scale

Graphite integration


WHY SENSU

✓“Can I do X with Sensu?” probably!


WHY SENSU


WHY SENSU?
✓

Sensu source is well-written and
easy to parse

✓


https://github.com/sensu

WHY SENSU?
✓sensu-community-plugins
80 contributors
✓
✓over 600 plugins
https://github.com/sensu/sensu✓community-plugins

TODAY at
PAPERLESS
Two Sensu environments (prod/testing)
~ 250 - 275 instances of sensu-client
4-6 Sensu-server instances
25k Metrics/Hour to Graphite
1 custom dashboard
1 custom CLI


RESOURCES
All of our
✓virtualized.Sensu infrastructure is
We typically give a
✓box 1.5GB RAM and sensu-server
2 processors,
scaling up RAM for any box running
more than one Sensu service on it.
4GB
✓install RAM for a monolithic Sensu
(Rabbit, Redis, all Sensu
components on one)

AS WE GREW
Growing pains and lessons learned...


NEEDS MORE
SENSU
✓High load on Sensu server
Backed-up queues in RabbitMQ
✓
TIP: set up check to monitor the
✓RabbitMQ ready queue size, you'll
want an email when the queue
grows about 10K and stays there


HOW TO SCALE
✓Add more sensu-server instances
No special conﬁguration needed
✓
checks will be
✓robin fashion todistributed in roundthe sensu-servers


GRAPHITE PAINS
symptoms: backed up queues in
✓RabbitMQ, spotty graphs
cluster couldn’t
with the
✓large amount of keep upwe were
metrics
now serving it via AMQP


GRAPHITE PAINS
✓

Solution: stop collecting metrics
every 10 seconds (excessive!)

✓

moved staging metrics to staging
Graphite cluster

✓

Moved prod Graphite cluster to
SSD


THE MIGRATION
or, How To Quit Nagios in Ten Easy Steps


STEP 1: NUKE AND
PAVE


STEP 2: PLAN
METRICS AND MONITORING SURVEY


METRICS AND MONITORING SURVEY


STEP 3: DEFINE
GLOBALS
✓CHECKS: must be actionable!
✓METRICS: go nuts
HANDLERS: EMAIL for everything
✓initially, added Pagerduty later.


OUR GLOBALS
✓

CHECKS: disk usage, swap usage,
zombie processes, RO ﬁlesystems

✓

METRICS: vmstat, disk usage, cpu,
memory, interface and disk perf

✓

HANDLERS: Email, Campﬁre,
Pagerduty


STEP 4: DEFINE
SPECIFICS
✓

For each server role, deﬁne
additional states to be checked and
alerted on:

✓Process Checks
✓System Checks
✓Service Checks
✓Service Metrics

STEP 5: SET UP A
PLACE TO TEST
✓

Set up a permanent testing Sensu
stack using your CM tool of choice

✓


we used sensu-chef cookbook

STEP 6: SET A
WORKFLOW
✓

Develop and document a workﬂow
for implementing, testing,
deploying and signing oﬀ on
checks

✓

You’ll get the best coverage if
anyone (developers or ops) can
easily add checks and metrics to
Sensu


EXAMPLE
WORKFLOW
add new sensu_check
✓appropriate cookbook deﬁnitions to the
in Chef
deploy
✓Chef new check to staging env using

✓Pull Request with sample graphs or alerts
✓Code Review from colleague
✓Deploy to Prod

SENSU IN CHEF


STEP 7: EXECUTE
WORKFLOW
Starting with the low-hanging
✓(plugins that already existed infruit
sensu-community-plugins
repository), conﬁgure and deploy
each check in the worksheet to the
testing Sensu server
deploy sensu-client to a few select
✓machines

STEP 8: WATCH
THE WATCHER
Set up some bare-minimum 3rd
✓party monitoring for the Sensu
servers

✓

We use Panopta’s agent to check
for aliveness, disk usage and CPU
usage.


MONITOR THE
MONITOR
✓

Other ideas: have Testing Sensu
monitor Prod Sensu

✓

Sensu can collect metrics about
itself


STEP 9: ROLLOUT
Deploy your
✓infrastructureProduction server
Roll out the client
✓the rest of the yourand checks to
prod
environments.


STEP 10: TUNE
✓
Expect to need to tune
✓and alert occurrences. thresholds
Laissez le bon alertes roulent!


SENSU
ARCHITECTURE


OMNIBUS
INSTALLER
is awesome


LET’S PLAY WITH
SENSU
If you haven’t been able to get your
sandboxes up and running,
please pair with someone near you.


SANDBOX GOALS
✓

Get familiar with Sensu
conﬁguration

✓
✓Deploy a check
Trigger an alert on that check
✓
Give you something to take home
✓and hack on
Install a Handler


OOPS
If you mess anything up:
vagrant halt; vagrant up
Worst case:
vagrant destroy; vagrant up


TWO
VIRTUALBOXES
Sensu-Server and Sensu-Client
Vagrant/Chef
Centos 6.4
Sensu Version 0.10.2


SENSU
CONFIGURATION
Please open up a terminal
✓into both your sensu-serverand SSH
and
sensu-client VMs

✓sudo su ✓cd /etc/sensu

SENSU
CONFIGURATION
✓/etc/sensu/config.json - conﬁg for
redis, rabbitmq, api and dashboard

✓/etc/sensu/conf.d/ - checks go here
✓/etc/sensu/conf.d/client.json client conﬁguration, subscriptions

✓

/etc/sensu/{extensions|handlers|
mutators|plugins}


TRIGGER AN
ALERT!
On sensu-client:
service sensu-client stop


CHECK YOUR
DASHBOARD
Open a web browser and
✓http://10.254.254.10:8080 go to
username:
✓secret admin / password:


HANDLERS
✓

A HANDLER takes action on an
event using a pipe, TCP, UDP,
AMQP, or a set of other handlers

Examples: send an
send
✓event to Pagerduty,email,metrics to
send
Graphite

✓

Default is “debug”

HANDLER
EXAMPLES
✓BASIC: send an email to ops@
ADVANCED: attempt to remediate
✓the alert (i.e. run a custom script
that spins up additional ec2
instances)


HANDLERS
Let’s conﬁgure an EMAIL handler
✓to send a informative email for an
event.

✓

/etc/sensu/handlers/mailer.rb
plugin is installed for you, we just
need to conﬁgure and install it


CONFIGURE THE
PLUGIN
ON SENSU SERVER:
vim /etc/sensu/conf.d/handlers/
mailer.json
{
"mailer": {
"mail_from": "sensu@you.com",
"mail_to": "you@yourdomain.com"
}
}

CONFIGURE THE
HANDLER
cp /etc/sensu/conf.d/handlers/
default.json
/etc/sensu/conf.d/handlers/
email.json
vim /etc/sensu/conf.d/handlers/
email.json


EMAIL.JSON
"handlers": {
"email": {
"type": "pipe",
"command": "/etc/sensu/handlers/
mailer.rb"
}
}


CHECK GEM
DEPENDENCIES
/opt/sensu/embedded/bin/gem list | grep mail


FIX PERMISSIONS

chown -R .sensu /etc/sensu/conf.d/


RESTART
SERVICES
service sensu-server restart
tail -100 /var/log/sensu/sensu-server.log
| grep mail


CHECKS
Sensu-client runs CHECKS that
✓deﬁned and scheduled either are
locally (standalone) or on the
sensu-server (subscription).
A CHECK sends a RESULT as
✓EVENT to a HANDLER - this an
applies to anything - service
checks, metrics, etc


CHECK
EXECUTION
✓

Either scheduled by the server
(subscription) or scheduled by the
client (standalone)

Today we will conﬁgure a
✓subscription-based check on the
server that will run on our client


LETS CONFIGURE
A CHECK
✓

Use check-procs.rb to make sure
at least one instance of cornbread
is running


DETERMINE OUR
CHECK COMMAND
On your SENSU CLIENT:
/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p
cornbread -W1


INSTALL OUR
CHECK
✓On your SENSU SERVER:
vim /etc/sensu/conf.d/checks/
✓cornbread_process.json


CORNBREAD_PRO
CESS.JSON


RESTART
SERVICES
service sensu-server restart
tail -100 /var/log/sensu/sensu-server.log
| grep cornbread


CHECK YOUR
DASHBOARD


CHECK YOUR
EMAIL


SENSU API
✓
✓HTTP/4567
on SENSU SERVER try:
✓
REST API

curl -l http://localhost:4567/events
| python -mjson.tool


SENSU SERVICES
✓Sensu API
Sensu Server
✓
✓Sensu Client
Sensu Dashboard
✓

EVERYTHING OK?
✓

/etc/init.d/sensu-service {client|
server|api|dashboard} {start|stop|
status|restart}

✓ps -ef | grep sensu
tail -f /var/log/sensu/*.log
✓
✓curl -l localhost:4567/info

COOL SENSU
TRICKS


SEND DIRECTLY
TO SENSU
netcat to: 127.0.0.0:3030


AGGREGATE
ALERTS
✓
Alert when
✓not OK X% of checks are are

Handy for preventing alert ﬂoods


MY SENSU TIPS
install the RabbitMQ management
✓web interface and bookmark it (see
http://10.254.254.10:15672/#/ )

✓

lock your plugins’ gem
dependency versions


TIPS TIPS TIPS
✓

have alternate ways to access your
Dashboard information

✓

we integrated our command-line
developer tools with Sensu API

✓

we also created our own Ops
dashboard that queries Sensu,
Graphite and our app for data


MORE TIPS

✓

Put NGINX in front of sensudashboard


HA SENSU
✓

Redundancy is easy (bring up
more sensu-servers)

✓

Making Redis and RabbitMQ HA
more challenging

✓

We’re still running one solitary
Redis and RabbitMQ but are OK
with this risk for now


WHERE TO GO
FOR HELP
✓
✓IRC: #sensu - freenode
sensu-users mailing list
✓

http://docs.sensuapp.org


QUESTIONS


THANK YOU
bethany@paperlesspost.com
@skymob - twitter
robotwitharose - #sensu on IRC (freenode)


Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

More Related Content

Viewers also liked

Similar to Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

Recently uploaded

Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

Editor's Notes