In-depth look into why Paperless Post chose Sensu, and how they monitor their services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes they made along the way, how they knew when to scale and how they did scale.
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
An Introduction to Sensu by Bethany Erskine
1. SENSE AND
SENSU-BILITY
Painless Metrics And Monitoring
In The Cloud with Sensu
Bethany Erskine
nycdevops Meetup
http://github.com/skymob/sensu-tutorial
Thursday, November 14, 13
13. TODAY at
PAPERLESS
Two Sensu environments (prod/testing)
~ 250 - 275 instances of sensu-client
4-6 Sensu-server instances
25k Metrics/Hour to Graphite
1 custom dashboard
1 custom CLI
Thursday, November 14, 13
14. RESOURCES
All of our
✓virtualized.Sensu infrastructure is
We typically give a
✓box 1.5GB RAM and sensu-server
4 processors,
scaling up RAM for any box running
more than one Sensu service on it.
4GB
✓install RAM for a monolithic Sensu
(Rabbit, Redis, all Sensu
components on one)
Thursday, November 14, 13
15. AS WE GREW
Growing pains and lessons learned...
Thursday, November 14, 13
16. NEEDS MORE
SENSU
✓High load on Sensu server
Backed-up queues in RabbitMQ
✓
TIP: set up check to monitor the
✓RabbitMQ ready queue size, you'll
want an email when the queue
grows about 10K and stays there
Thursday, November 14, 13
17. HOW TO SCALE
✓Add more sensu-server instances
No special configuration needed
✓
checks will be
✓robin fashion todistributed in roundthe sensu-servers
Thursday, November 14, 13
18. GRAPHITE PAINS
symptoms: backed up queues in
✓RabbitMQ, spotty graphs
cluster couldn’t
with the
✓large amount of keep upwe were
metrics
now serving it via AMQP
Thursday, November 14, 13
19. GRAPHITE PAINS
✓
Solution: stop collecting metrics
every 10 seconds (excessive!)
✓
moved staging metrics to staging
Graphite cluster
✓
Moved prod Graphite cluster to
SSD
Thursday, November 14, 13
24. STEP 3: DEFINE
GLOBALS
✓CHECKS: must be actionable!
✓METRICS: go nuts
HANDLERS: EMAIL for everything
✓initially, added Pagerduty later.
Thursday, November 14, 13
25. OUR GLOBALS
✓
CHECKS: disk usage, swap usage,
zombie processes, RO filesystems
✓
METRICS: vmstat, disk usage, cpu,
memory, interface and disk perf
✓
HANDLERS: Email, Campfire,
Pagerduty
Thursday, November 14, 13
26. STEP 4: DEFINE
SPECIFICS
✓
For each server role, define
additional states to be checked and
alerted on:
✓Process Checks
✓System Checks
✓Service Checks
✓Service Metrics
Thursday, November 14, 13
27. STEP 5: SET UP A
PLACE TO TEST
✓
Set up a permanent testing Sensu
stack using your CM tool of choice
✓
Thursday, November 14, 13
we used sensu-chef cookbook
28. STEP 6: SET A
WORKFLOW
✓
Develop and document a workflow
for implementing, testing,
deploying and signing off on
checks
✓
You’ll get the best coverage if
anyone (developers or ops) can
easily add checks and metrics to
Sensu
Thursday, November 14, 13
29. EXAMPLE
WORKFLOW
add new sensu_check
✓appropriate cookbook definitions to the
in Chef
deploy
✓Chef new check to staging env using
✓Pull Request with sample graphs or alerts
✓Code Review from colleague
✓Deploy to Prod
Thursday, November 14, 13
31. STEP 7: EXECUTE
WORKFLOW
Starting with the low-hanging
✓(plugins that already existed infruit
sensu-community-plugins
repository), configure and deploy
each check in the worksheet to the
testing Sensu server
deploy sensu-client to a few select
✓machines
Thursday, November 14, 13
32. STEP 8: WATCH
THE WATCHER
Set up some bare-minimum 3rd
✓party monitoring for the Sensu
servers
Thursday, November 14, 13
34. MONITOR THE
MONITOR
✓
Other ideas: have Testing Sensu
monitor Prod Sensu
✓
Sensu can collect metrics about
itself
Thursday, November 14, 13
35. STEP 9: ROLLOUT
Deploy your
✓infrastructureProduction server
Roll out the client
✓the rest of the yourand checks to
prod
environments.
Thursday, November 14, 13
36. STEP 10: TUNE
✓
Expect to need to tune
✓and alert occurrences. thresholds
Laissez le bon alertes roulent!
Thursday, November 14, 13
40. LET’S PLAY WITH
SENSU
If you haven’t been able to get your
sandboxes up and running,
please pair with someone near you.
Thursday, November 14, 13
41. SANDBOX GOALS
✓
Get familiar with Sensu
configuration
✓
✓Deploy a check
Trigger an alert on that check
✓
Give you something to take home
✓and hack on
Install a Handler
Thursday, November 14, 13
42. OOPS
If you mess anything up:
vagrant halt; vagrant up
Worst case:
vagrant destroy; vagrant up
Thursday, November 14, 13
44. SENSU
CONFIGURATION
Please open up a terminal
✓into both your sensu-serverand SSH
and
sensu-client VMs
✓sudo su ✓cd /etc/sensu
Thursday, November 14, 13
45. SENSU
CONFIGURATION
✓/etc/sensu/config.json - config for
redis, rabbitmq, api and dashboard
✓/etc/sensu/conf.d/ - checks go here
✓/etc/sensu/conf.d/client.json client configuration, subscriptions
✓
/etc/sensu/{extensions|handlers|
mutators|plugins}
Thursday, November 14, 13
47. CHECK YOUR
DASHBOARD
Open a web browser and
✓http://10.254.254.10:8080 go to
username:
✓secret admin / password:
Thursday, November 14, 13
48. HANDLERS
✓
A HANDLER takes action on an
event using a pipe, TCP, UDP,
AMQP, or a set of other handlers
Examples: send an
send
✓event to Pagerduty,email,metrics to
send
Graphite
✓
Thursday, November 14, 13
Default is “debug”
49. HANDLER
EXAMPLES
✓BASIC: send an email to ops@
ADVANCED: attempt to remediate
✓the alert (i.e. run a custom script
that spins up additional ec2
instances)
Thursday, November 14, 13
50. HANDLERS
Let’s configure an EMAIL handler
✓to send a informative email for an
event.
✓
/etc/sensu/handlers/mailer.rb
plugin is installed for you, we just
need to configure and install it
Thursday, November 14, 13
51. CONFIGURE THE
PLUGIN
ON SENSU SERVER:
vim /etc/sensu/conf.d/handlers/
mailer.json
{
"mailer": {
"mail_from": "sensu@you.com",
"mail_to": "you@yourdomain.com"
}
}
Thursday, November 14, 13
57. CHECKS
Sensu-client runs CHECKS that
✓defined and scheduled either are
locally (standalone) or on the
sensu-server (subscription).
A CHECK sends a RESULT as
✓EVENT to a HANDLER - this an
applies to anything - service
checks, metrics, etc
Thursday, November 14, 13
58. CHECK
EXECUTION
✓
Either scheduled by the server
(subscription) or scheduled by the
client (standalone)
Today we will configure a
✓subscription-based check on the
server that will run on our client
Thursday, November 14, 13
59. LETS CONFIGURE
A CHECK
✓
Use check-procs.rb to make sure
at least one instance of cornbread
is running
Thursday, November 14, 13
60. DETERMINE OUR
CHECK COMMAND
On your SENSU CLIENT:
/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p
cornbread -W1
Thursday, November 14, 13
61. INSTALL OUR
CHECK
✓On your SENSU SERVER:
vim /etc/sensu/conf.d/checks/
✓cornbread_process.json
Thursday, November 14, 13
72. MY SENSU TIPS
install the RabbitMQ management
✓web interface and bookmark it (see
http://10.254.254.10:15672/#/ )
✓
lock your plugins’ gem
dependency versions
Thursday, November 14, 13
73. TIPS TIPS TIPS
✓
have alternate ways to access your
Dashboard information
✓
we integrated our command-line
developer tools with Sensu API
✓
we also created our own Ops
dashboard that queries Sensu,
Graphite and our app for data
Thursday, November 14, 13
75. HA SENSU
✓
Redundancy is easy (bring up
more sensu-servers)
✓
Making Redis and RabbitMQ HA
more challenging
✓
We’re still running one solitary
Redis and RabbitMQ but are OK
with this risk for now
Thursday, November 14, 13
76. WHERE TO GO
FOR HELP
✓
✓IRC: #sensu - freenode
sensu-users mailing list
✓
http://docs.sensuapp.org
Thursday, November 14, 13