Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

on

  • 11,310 views

Are you unhappy with the state of monitoring in your organization? Are you successfully automating “all the things” except your monitoring checks? Are you tired of looking at monitoring dashboards ...

Are you unhappy with the state of monitoring in your organization? Are you successfully automating “all the things” except your monitoring checks? Are you tired of looking at monitoring dashboards that hark from another era? Do you long to access your monitoring system via a REST API?

Paperless Post recently solved these problems by replacing Nagios with Sensu, a new and awesome free monitoring and metrics router that is designed with configuration management and cloud deployments in mind.

In my presentation we’ll take an in-depth look into why we chose Sensu and how we monitor our services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes we made along the way, how we knew when to scale and how we did it. I’ll also cover how we’re making our Sensu setup redundant and highly available, how we’re monitoring and collecting metrics about Sensu, and how we’ve integrated our internal tools with Sensu.

Statistics

Views

Total Views
11,310
Views on SlideShare
11,255
Embed Views
55

Actions

Likes
23
Downloads
82
Comments
1

2 Embeds 55

https://twitter.com 54
http://silopolis.soup.io 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Thanks for sharing your experience
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • I’m curious, How many Nagios users do we have here? Anyone here running Sensu in production? <br /> How many people here are happy with their monitoring system? <br /> And how many of you are here today because you think monitoring sucks? <br />
  • So, the reason I’m here today is because can safely say that I LOVE monitoring. But it hasn’t always been this way.... <br />
  • In 2011 I joined Paperless Post as the second ever member of the Operations team. Our new small team faced many challenges, one of which was fixing our monitoring infrastructure’s sad state. We had one monolithic Nagios server with no version control and no configuration management. Every time we needed to add hosts, services, etc, we manually edited the files and restarted the Nagios server. Even though I had years of experience living with Nagios and could write configs in my sleep, I dreaded ever having to add new hosts or checks. <br />
  • Our metrics collection setup was in even worse shape. Munin was deployed for a handful of servers, but was so awkward to work with it’d all but been abandoned. We were sending data directly from our Rails app to Graphite, but no server system metrics were making it there at all. This was no way to be. But I don’t want to spent all morning telling you how much Nagios sucks, let me tell you a little monitoring love story about Paperless Post and Sensu. <br />
  • In the fall of 2012 we’d outgrown our old managed hosting service and found a new provider and were making preparations to move to a new datacenter. At this point our entire infrastructure with the exception of Nagios were managed by Chef, and our plan was to bring up the new datacenter infrastructure entirely using Chef. We saw an opportunity to start fresh and explored our options, and quickly fell in love with Sensu <br />
  • Sensu is Ruby, which we know and love and Paperless. Although the Sensu components are written in Ruby, checks and plugins can be written in any language <br /> There was already a fully-featured Chef recipe for Sensu - in fact, Sensu was designed with configuration management in mind <br /> We saw an opportunity to get involved with a young project that we could potentially contribute back to. The sensu-community-plugins were my first real open-source contributions, and after nearly two years with it, I feel strongly enough about the project to keeps supporting it in any way I can, which is why I’m here today. <br />
  • Because we were on a tight deadline to deploy Sensu, the prospect of re-using existing Nagios checks appealed to me, with the option of re-writing them in Ruby using the Sensu plugin libraries later on down the line <br /> Metrics and Checks all handled by one system. We were fully sold on being able to gather metrics using the same client that ran our health checks, and were excited about the prospect of seeing our system-level metrics on the same system as our application-level metrics via Graphite/Graphiti. <br /> Sensu had potential to scale easily, something we’d end up needing to call on later <br />
  • Sensu is incredibly flexible tool. I’ve yet to come up with a device or situation that couldn’t somehow be handled by Sensu. It’s sometimes referred to as a monitoring “router”, which is a very accurate description. It can handle any input and pass it off to any other script, system, or handler that you want. <br />
  • Sensu LOVES the cloud and deals beautifully with ephemeral machine environments. We simply added an API call to our devtools so that deleting a node is as simple as saying `pp sensu delete_client foo`. This command can also be run from Jenkins or even theoretically from a client node itself before shutting down. <br /> We&apos;re able to silence entire environments at a time using one simple command: `pp sensu silence production` collects all production nodes from Chef and then silences them using the Sensu API. <br />
  • most use sensu-plugin gem and are written in Ruby, but all languages are welcome <br />
  • A little about our Sensu setup at Paperless Post. <br /> We have two Sensu environments: production and testing. Production runs 3, sometimes 4 instances of sensu-server, and testing 1, sometimes 2. We do not have this elasticity automated, but I’ll touch a little later on when we know to scale out by adding another sensu-server to the cluster. <br /> We’re pushing 25K metrics per hour through Sensu to our Graphite cluster using Sensu’s AMQP handler. <br />
  • Overall our transition from Nagios to Sensu was incredibly smooth. But as we grew there were of course problems here and there... <br />
  • Initially we’d deployed a single Sensu server to handle all of production, but it became obvious it was time to scale when we saw some of these symptoms: high load on sensu server and backed-up queues in RabbitMQ. We have a Sensu check set up to alert us if the RabbitMQ queue size grows over 10K messages and stays there for longer than five minutes. <br />
  • How do you scale? it’s a simple as bootstrapping another Sensu-server. In our case, Chef role[sensu-server] (which brings up a box running just sensu-server - no API or Dashboard). <br /> No other special configuration is needed, just use the same config as the rest of the environment, and checks will be distributed in a round-robin fashion to your sensu servers. <br />
  • The only major pains we experienced with Sensu have been related to Graphite. We started seeing backed-up queues and spotty graphs in RabbitMQ. <br /> Throwing more sensu-servers at the problem didn’t help in this case, and it turns out that our Graphite cluster just couldn’t keep up with the large amount of metrics we were now serving it via AMQP. <br /> AMQP works, but in some ways isn’t ideal - in our case, AMQP bypasses carbon-relay and thus the replication schema, and sends every metric to every cluster node, which is overkill for a six-node cluster. <br />
  • We experimented with writing our own consumer, but ended up with the following solution: we stopped collecting metrics every 10 seconds (which was overkill anyway), and moved our staging metrics off of the production Graphite cluster and onto their own staging Graphite cluster. We then moved the production Graphite cluster’s VMs on to SSDs. In fact, I spent most of last week writing scripts to migrate Whisper files off of a six-node VM ware Graphite cluster on a a 2-node dedicated hardware cluster w/ SSDs. <br />
  • Now, I want to tell you our tried and true method for a successful and happy transition from Nagios, or your monitoring system of choice, to Sensu. <br />
  • There is a lot of talk in the Ops community about Alert Fatigue, and moving to a new monitoring system is a golden opportunity to clear your slate, clean up your alerts and determine what your REALLY care about. <br /> Also, because of differences in the way each monitoring system implements checks, it usually makes sense to just start from scratch rather than try to port existing check schemas over to a new system. <br /> This is a great opportunity to stop sending emails for things that don&apos;t matter - do you really need an email every time your CPU is pegged? probably not. <br />
  • Metrics and Monitoring planning spreadsheet is a tool we used to survey all of our servers and determine what needed to be gathered and monitored. <br />
  • I’ve shared this document with you on my Github in the “sensu-tutorial” repository. This spreadsheet contains a column for ... Example: <br />
  • DETERMINE YOUR BASELINE - For ‘base’ role we made a list of things we wanted to know about every single machine. <br /> Our criteria for a CHECK is it must be actionable <br /> IF it’s something we want to know but don’t necessary need to act on, make a METRIC <br />
  • disk usage, swap usage, zombie processes, RO filesystems <br /> for METRICS, we gather vmstat, disk usage, cpu, memory, interface and disk performance metrics on every machine. <br /> HANDLERS, we chose email for everything initially, then added Pagerduty later for only the most critical, must-wake-up-at-3am type alerts. We have a dedicated room in Campfire for receiving Sensu alerts. <br />
  • DEFINE SPECIFICS For each role (in our case, Chef roles, but could be any machine, device or server role), we gathered the following: <br /> Process Checks (at least 4 Unicorn workers should be running but no more than 20)System Checks (anything beyond our baseline system checks - say maybe we want to check for RO mounts only on servers that actually mount something)Service Checks (database locks, database connections, HTTP response) <br /> Service Metrics (haproxy bytes in/out)Other <br />
  • SET UP A TESTING ENVIRONMENT: This will get you familiar with deploying and administrating Sensu, <br /> I strongly recommend having a permanent place to test all of your Sensu checks and configuration changes using your CM tool of choice. It can be dual purpose and serve your staging environments, and is a good place to test things like Sensu package upgrades. <br /> We set up a Testing sensu infrastructure in the old datacenter, deploying using sensu-chef cookbook, which we customized as needed <br />
  • Develop a workflow for implementing, testing, deploying and signing off on checks. <br /> You’ll get the best check coverage if anyone on your team (developers, ops) can easily add checks or metrics to Sensu. <br />
  • Our workflow at Paperless Post: using Chef (which we’re deploying using our devtools with the help of Jenkins), we develop and deploy our checks to testing environment. We then do a pull-request, including any notes about how we tested or metrics sample graphs or outputs. We have a colleague do a quick code review and approve that pull request, then we deploy to prod. <br />
  • now the fun part: START DEPLOYING CHECKS! Starting with the low-hanging fruit (checks that utilized plugins that already existed in sensu-community-plugins repository), started deploying each check that you defined in the worksheet to the testing sensu server. <br /> If a suitable plugin didn’t already exist in sensu-community-plugins, we had two choices: 1) re-use a Nagios check or 2) write our own in Ruby or Bash. <br />
  • Monitor your monitoring system! This should be self-explanatory. Set up some bare-minimum 3rd party monitoring for the Sensu servers themselves so you’ll know if the VM goes completely down (this has not yet happened to us!) or runs out of disk space. <br />
  • We use Panopta’s agent-based monitor to check for aliveness, disk usage and CPU usage. <br />
  • Other ideas: have your Testing sensu set up monitor Production sensu. <br /> Sensu can collect metrics about itself so there’s no need for a 3rd party system there. <br />
  • This step is simple: Deploy your now well-tested server infastructure using your now well-tested Configuration Management recipes. This should go smoothly because you’ve had plenty of practice rolling out and administering your testing setup as well as all of your checks. <br /> First you’ll want to stand up the production Sensu server stack, then you’ll roll out sensu-client to the rest of your production servers or VMs. <br />
  • Let the alerts roll in! You’ll likely need to tune thresholds, alert occurrences, etc once you have your checks running against actual production traffic. <br />
  • Quick overview of the Sensu architecture and how it’s deployed on your VirtualBoxes. Sensu uses RabbitMQ for all communication between the client and the server. RabbitMQ and Redis are all running on your sensu-server VM, as well as the Dashboard (not pictured here), the API, and the Server. Redis is used to persist data for use by the API. <br />
  • Sensu package contains all of it&apos;s dependencies in an "omnibus" installer, meaning it embeds everything it needs into /opt/sensu. This is great because you don’t need to worry about whether your system ruby is going to work with it, and you don’t even need to install system-wide ruby if you don’t need it. <br />
  • BREAK HERE if needed :) <br />
  • A little background on the Sandbox. I used Vagrant and Chef to bring up these boxes. The original Vagrantfile will be available online for you. I didn’t want to spend too much time showing you how to deploy Sensu with Chef because I didn’t want to give the impression that Chef is your only option for deploying Sensu. However, if you are already familiar with Chef, you can check my sensu-tutorial github to see (and use) the recipes used to build these boxes. <br /> Today we’re going to do some hand-configuration, just for you to get familiar with how Checks and Handlers work, but in reality, you’d be using your configuration management system of choice to deploy all of these. <br />
  • If you open config.json on both the sensu-server and sensu-client VMs, you’ll see they are exactly the same. <br />
  • Let’s jump right in and trigger an alert! By default, a Keepalive warning alert will be raised if the server doesn’t hear from the client after 120 seconds, critical threshold is 180. This is tunable on a per-client basis. <br />
  • A handler is what takes action on an event, basically how the alert reaches a human. <br /> All events are displayed to the Dashboard, regardless of handler. <br /> Handlers can be sent through pipe, tcp, udp, amqp, to a set of other handlers. <br />
  • So let’s configure a handler to send an email notification out for an event. I went ahead and installed the `mailer.rb` plugin and gem deps for you. Make sure you are on the server for all of the following config steps. <br />
  • Now let’s install the handler. Let’s use the ‘default’ handler config as a template, and copy it over to email.json <br />
  • I’ve acutally already installed the `mail` gem dependency for you, which you can see by issuing the above command. <br />
  • Now we need to set up a check to use the handler we just set up. <br />
  • If you want to try this on your sensu sandbox, you’ll need to `yum install nc`, please don’t all try this right now :) <br />
  • guest/guest <br />
  • Put Nginx In front of sensu-dashboard <br /> Sensu dashboard runs on port 8080 and requires authentication, neither of which are yet configurable. We resolved this minor annoyance by running Nginx in front of the dashboard, proxying to 8080 and injecting authentication headers into Sensu so we don’t need to log in when viewing Sensu on our VPN. <br />
  • Making sensu-server redundant is easy - all you need to do is bring up more instances of sensu-server - but scaling out and making Redis and RabbitMQ highly available can be more challenging from an operational perspective. At Paperless, we are still running one solitary Redis instance for Sensu, but are comfortable with this because a) bootstrapping a new one with Chef would be trivial and b) the data it contains is not mission critical and could be easily re-generated and c) we’ve had zero performance or stability issues with it thus far. <br /> Because RabbitMQ is a mission-critical piece of Sensu, we would like to, at some point, separate out Rabbit into a cluster with one disk node and one RAM node with HAProxy in front. However, I’ve never quite been able to get HAProxy tuned for Sensu’s liking. When and if I do, expect a blog post. If anyone here has experience running RabbitMQ clusters, I’d love to hear from you! <br />

Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu Presentation Transcript

  • 1. SENSE AND SENSU-BILITY Painless Metrics And Monitoring In The Cloud with Sensu Bethany Erskine Velocity NYC 2013 http://github.com/skymob/sensu-tutorial Monday, October 14, 13
  • 2. BEFORE I BEGIN... IF YOU DID NOT SET UP SENSU-TUTORIAL BEFORE THE CLASS: 1. grab a USB key 2. follow the instructions on the README If you don’t have a computer, no sweat! Monday, October 14, 13
  • 3. DO YOU LOVE YOUR MONITORING SETUP? Monday, October 14, 13
  • 4. #MONITORINGLOVE Monday, October 14, 13
  • 5. MY STORY + (╯︵╰,) Monday, October 14, 13
  • 6. Monday, October 14, 13
  • 7. Monday, October 14, 13
  • 8. + Monday, October 14, 13
  • 9. WHY SENSU ✓Ruby Plugins can be written in any ✓language ✓ ✓community sensu-chef cookbook Monday, October 14, 13
  • 10. WHY SENSU ✓re-use Nagios checks! metrics and checks all collected by ✓one system ✓ ✓easy to scale Graphite integration Monday, October 14, 13
  • 11. WHY SENSU ✓“Can I do X with Sensu?” probably! Monday, October 14, 13
  • 12. WHY SENSU Monday, October 14, 13
  • 13. WHY SENSU? ✓ Sensu source is well-written and easy to parse ✓ Monday, October 14, 13 https://github.com/sensu
  • 14. WHY SENSU? ✓sensu-community-plugins 80 contributors ✓ ✓over 600 plugins https://github.com/sensu/sensu✓community-plugins Monday, October 14, 13
  • 15. TODAY at PAPERLESS Two Sensu environments (prod/testing) ~ 250 - 275 instances of sensu-client 4-6 Sensu-server instances 25k Metrics/Hour to Graphite 1 custom dashboard 1 custom CLI Monday, October 14, 13
  • 16. RESOURCES All of our ✓virtualized.Sensu infrastructure is We typically give a ✓box 1.5GB RAM and sensu-server 2 processors, scaling up RAM for any box running more than one Sensu service on it. 4GB ✓install RAM for a monolithic Sensu (Rabbit, Redis, all Sensu components on one) Monday, October 14, 13
  • 17. AS WE GREW Growing pains and lessons learned... Monday, October 14, 13
  • 18. NEEDS MORE SENSU ✓High load on Sensu server Backed-up queues in RabbitMQ ✓ TIP: set up check to monitor the ✓RabbitMQ ready queue size, you'll want an email when the queue grows about 10K and stays there Monday, October 14, 13
  • 19. HOW TO SCALE ✓Add more sensu-server instances No special configuration needed ✓ checks will be ✓robin fashion todistributed in roundthe sensu-servers Monday, October 14, 13
  • 20. GRAPHITE PAINS symptoms: backed up queues in ✓RabbitMQ, spotty graphs cluster couldn’t with the ✓large amount of keep upwe were metrics now serving it via AMQP Monday, October 14, 13
  • 21. GRAPHITE PAINS ✓ Solution: stop collecting metrics every 10 seconds (excessive!) ✓ moved staging metrics to staging Graphite cluster ✓ Moved prod Graphite cluster to SSD Monday, October 14, 13
  • 22. THE MIGRATION or, How To Quit Nagios in Ten Easy Steps Monday, October 14, 13
  • 23. STEP 1: NUKE AND PAVE Monday, October 14, 13
  • 24. STEP 2: PLAN METRICS AND MONITORING SURVEY Monday, October 14, 13
  • 25. METRICS AND MONITORING SURVEY Monday, October 14, 13
  • 26. STEP 3: DEFINE GLOBALS ✓CHECKS: must be actionable! ✓METRICS: go nuts HANDLERS: EMAIL for everything ✓initially, added Pagerduty later. Monday, October 14, 13
  • 27. OUR GLOBALS ✓ CHECKS: disk usage, swap usage, zombie processes, RO filesystems ✓ METRICS: vmstat, disk usage, cpu, memory, interface and disk perf ✓ HANDLERS: Email, Campfire, Pagerduty Monday, October 14, 13
  • 28. STEP 4: DEFINE SPECIFICS ✓ For each server role, define additional states to be checked and alerted on: ✓Process Checks ✓System Checks ✓Service Checks ✓Service Metrics Monday, October 14, 13
  • 29. STEP 5: SET UP A PLACE TO TEST ✓ Set up a permanent testing Sensu stack using your CM tool of choice ✓ Monday, October 14, 13 we used sensu-chef cookbook
  • 30. STEP 6: SET A WORKFLOW ✓ Develop and document a workflow for implementing, testing, deploying and signing off on checks ✓ You’ll get the best coverage if anyone (developers or ops) can easily add checks and metrics to Sensu Monday, October 14, 13
  • 31. EXAMPLE WORKFLOW add new sensu_check ✓appropriate cookbook definitions to the in Chef deploy ✓Chef new check to staging env using ✓Pull Request with sample graphs or alerts ✓Code Review from colleague ✓Deploy to Prod Monday, October 14, 13
  • 32. SENSU IN CHEF Monday, October 14, 13
  • 33. STEP 7: EXECUTE WORKFLOW Starting with the low-hanging ✓(plugins that already existed infruit sensu-community-plugins repository), configure and deploy each check in the worksheet to the testing Sensu server deploy sensu-client to a few select ✓machines Monday, October 14, 13
  • 34. STEP 8: WATCH THE WATCHER Set up some bare-minimum 3rd ✓party monitoring for the Sensu servers ✓ We use Panopta’s agent to check for aliveness, disk usage and CPU usage. Monday, October 14, 13
  • 35. Monday, October 14, 13
  • 36. MONITOR THE MONITOR ✓ Other ideas: have Testing Sensu monitor Prod Sensu ✓ Sensu can collect metrics about itself Monday, October 14, 13
  • 37. STEP 9: ROLLOUT Deploy your ✓infrastructureProduction server Roll out the client ✓the rest of the yourand checks to prod environments.  Monday, October 14, 13
  • 38. STEP 10: TUNE ✓ Expect to need to tune ✓and alert occurrences. thresholds Laissez le bon alertes roulent! Monday, October 14, 13
  • 39. SENSU ARCHITECTURE Monday, October 14, 13
  • 40. SENSU ARCHITECTURE Monday, October 14, 13
  • 41. OMNIBUS INSTALLER is awesome Monday, October 14, 13
  • 42. LET’S PLAY WITH SENSU If you haven’t been able to get your sandboxes up and running, please pair with someone near you. Monday, October 14, 13
  • 43. SANDBOX GOALS ✓ Get familiar with Sensu configuration ✓ ✓Deploy a check Trigger an alert on that check ✓ Give you something to take home ✓and hack on Install a Handler Monday, October 14, 13
  • 44. OOPS If you mess anything up: vagrant halt; vagrant up Worst case: vagrant destroy; vagrant up Monday, October 14, 13
  • 45. TWO VIRTUALBOXES Sensu-Server and Sensu-Client Vagrant/Chef Centos 6.4 Sensu Version 0.10.2 Monday, October 14, 13
  • 46. SENSU CONFIGURATION Please open up a terminal ✓into both your sensu-serverand SSH and sensu-client VMs ✓sudo su ✓cd /etc/sensu Monday, October 14, 13
  • 47. SENSU CONFIGURATION ✓/etc/sensu/config.json - config for redis, rabbitmq, api and dashboard ✓/etc/sensu/conf.d/ - checks go here ✓/etc/sensu/conf.d/client.json client configuration, subscriptions ✓ /etc/sensu/{extensions|handlers| mutators|plugins} Monday, October 14, 13
  • 48. TRIGGER AN ALERT! On sensu-client: service sensu-client stop Monday, October 14, 13
  • 49. CHECK YOUR DASHBOARD Open a web browser and ✓http://10.254.254.10:8080 go to username: ✓secret admin / password: Monday, October 14, 13
  • 50. HANDLERS ✓ A HANDLER takes action on an event using a pipe, TCP, UDP, AMQP, or a set of other handlers Examples: send an send ✓event to Pagerduty,email,metrics to send Graphite ✓ Monday, October 14, 13 Default is “debug”
  • 51. HANDLER EXAMPLES ✓BASIC: send an email to ops@ ADVANCED: attempt to remediate ✓the alert (i.e. run a custom script that spins up additional ec2 instances) Monday, October 14, 13
  • 52. HANDLERS Let’s configure an EMAIL handler ✓to send a informative email for an event. ✓ /etc/sensu/handlers/mailer.rb plugin is installed for you, we just need to configure and install it Monday, October 14, 13
  • 53. CONFIGURE THE PLUGIN ON SENSU SERVER: vim /etc/sensu/conf.d/handlers/ mailer.json { "mailer": { "mail_from": "sensu@you.com", "mail_to": "you@yourdomain.com" } } Monday, October 14, 13
  • 54. CONFIGURE THE HANDLER cp /etc/sensu/conf.d/handlers/ default.json /etc/sensu/conf.d/handlers/ email.json vim /etc/sensu/conf.d/handlers/ email.json Monday, October 14, 13
  • 55. EMAIL.JSON "handlers": { "email": { "type": "pipe", "command": "/etc/sensu/handlers/ mailer.rb" } } Monday, October 14, 13
  • 56. CHECK GEM DEPENDENCIES /opt/sensu/embedded/bin/gem list | grep mail Monday, October 14, 13
  • 57. FIX PERMISSIONS chown -R .sensu /etc/sensu/conf.d/ Monday, October 14, 13
  • 58. RESTART SERVICES service sensu-server restart tail -100 /var/log/sensu/sensu-server.log | grep mail Monday, October 14, 13
  • 59. CHECKS Sensu-client runs CHECKS that ✓defined and scheduled either are locally (standalone) or on the sensu-server (subscription). A CHECK sends a RESULT as ✓EVENT to a HANDLER - this an applies to anything - service checks, metrics, etc Monday, October 14, 13
  • 60. CHECK EXECUTION ✓ Either scheduled by the server (subscription) or scheduled by the client (standalone) Today we will configure a ✓subscription-based check on the server that will run on our client Monday, October 14, 13
  • 61. LETS CONFIGURE A CHECK ✓ Use check-procs.rb to make sure at least one instance of cornbread is running Monday, October 14, 13
  • 62. DETERMINE OUR CHECK COMMAND On your SENSU CLIENT: /opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p cornbread -W1 Monday, October 14, 13
  • 63. INSTALL OUR CHECK ✓On your SENSU SERVER: vim /etc/sensu/conf.d/checks/ ✓cornbread_process.json Monday, October 14, 13
  • 64. CORNBREAD_PRO CESS.JSON Monday, October 14, 13
  • 65. RESTART SERVICES service sensu-server restart tail -100 /var/log/sensu/sensu-server.log | grep cornbread Monday, October 14, 13
  • 66. CHECK YOUR DASHBOARD Monday, October 14, 13
  • 67. CHECK YOUR EMAIL Monday, October 14, 13
  • 68. SENSU API ✓ ✓HTTP/4567 on SENSU SERVER try: ✓ REST API curl -l http://localhost:4567/events | python -mjson.tool Monday, October 14, 13
  • 69. SENSU SERVICES ✓Sensu API Sensu Server ✓ ✓Sensu Client Sensu Dashboard ✓ Monday, October 14, 13
  • 70. EVERYTHING OK? ✓ /etc/init.d/sensu-service {client| server|api|dashboard} {start|stop| status|restart} ✓ps -ef | grep sensu tail -f /var/log/sensu/*.log ✓ ✓curl -l localhost:4567/info Monday, October 14, 13
  • 71. COOL SENSU TRICKS Monday, October 14, 13
  • 72. SEND DIRECTLY TO SENSU netcat to: 127.0.0.0:3030 Monday, October 14, 13
  • 73. AGGREGATE ALERTS ✓ Alert when ✓not OK X% of checks are are Handy for preventing alert floods Monday, October 14, 13
  • 74. MY SENSU TIPS install the RabbitMQ management ✓web interface and bookmark it (see http://10.254.254.10:15672/#/ ) ✓ lock your plugins’ gem dependency versions Monday, October 14, 13
  • 75. TIPS TIPS TIPS ✓ have alternate ways to access your Dashboard information ✓ we integrated our command-line developer tools with Sensu API ✓ we also created our own Ops dashboard that queries Sensu, Graphite and our app for data Monday, October 14, 13
  • 76. MORE TIPS ✓ Put NGINX in front of sensudashboard Monday, October 14, 13
  • 77. HA SENSU ✓ Redundancy is easy (bring up more sensu-servers) ✓ Making Redis and RabbitMQ HA more challenging ✓ We’re still running one solitary Redis and RabbitMQ but are OK with this risk for now Monday, October 14, 13
  • 78. WHERE TO GO FOR HELP ✓ ✓IRC: #sensu - freenode sensu-users mailing list ✓ http://docs.sensuapp.org Monday, October 14, 13
  • 79. QUESTIONS Monday, October 14, 13
  • 80. THANK YOU bethany@paperlesspost.com @skymob - twitter robotwitharose - #sensu on IRC (freenode) Monday, October 14, 13