Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

21,058 views

Published on

Are you unhappy with the state of monitoring in your organization? Are you successfully automating “all the things” except your monitoring checks? Are you tired of looking at monitoring dashboards that hark from another era? Do you long to access your monitoring system via a REST API?

Paperless Post recently solved these problems by replacing Nagios with Sensu, a new and awesome free monitoring and metrics router that is designed with configuration management and cloud deployments in mind.

In my presentation we’ll take an in-depth look into why we chose Sensu and how we monitor our services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes we made along the way, how we knew when to scale and how we did it. I’ll also cover how we’re making our Sensu setup redundant and highly available, how we’re monitoring and collecting metrics about Sensu, and how we’ve integrated our internal tools with Sensu.

Published in: Technology, Business

Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

  1. 1. SENSE AND SENSU-BILITY Painless Metrics And Monitoring In The Cloud with Sensu Bethany Erskine Velocity NYC 2013 http://github.com/skymob/sensu-tutorial Monday, October 14, 13
  2. 2. BEFORE I BEGIN... IF YOU DID NOT SET UP SENSU-TUTORIAL BEFORE THE CLASS: 1. grab a USB key 2. follow the instructions on the README If you don’t have a computer, no sweat! Monday, October 14, 13
  3. 3. DO YOU LOVE YOUR MONITORING SETUP? Monday, October 14, 13
  4. 4. #MONITORINGLOVE Monday, October 14, 13
  5. 5. MY STORY + (╯︵╰,) Monday, October 14, 13
  6. 6. Monday, October 14, 13
  7. 7. Monday, October 14, 13
  8. 8. + Monday, October 14, 13
  9. 9. WHY SENSU ✓Ruby Plugins can be written in any ✓language ✓ ✓community sensu-chef cookbook Monday, October 14, 13
  10. 10. WHY SENSU ✓re-use Nagios checks! metrics and checks all collected by ✓one system ✓ ✓easy to scale Graphite integration Monday, October 14, 13
  11. 11. WHY SENSU ✓“Can I do X with Sensu?” probably! Monday, October 14, 13
  12. 12. WHY SENSU Monday, October 14, 13
  13. 13. WHY SENSU? ✓ Sensu source is well-written and easy to parse ✓ Monday, October 14, 13 https://github.com/sensu
  14. 14. WHY SENSU? ✓sensu-community-plugins 80 contributors ✓ ✓over 600 plugins https://github.com/sensu/sensu✓community-plugins Monday, October 14, 13
  15. 15. TODAY at PAPERLESS Two Sensu environments (prod/testing) ~ 250 - 275 instances of sensu-client 4-6 Sensu-server instances 25k Metrics/Hour to Graphite 1 custom dashboard 1 custom CLI Monday, October 14, 13
  16. 16. RESOURCES All of our ✓virtualized.Sensu infrastructure is We typically give a ✓box 1.5GB RAM and sensu-server 2 processors, scaling up RAM for any box running more than one Sensu service on it. 4GB ✓install RAM for a monolithic Sensu (Rabbit, Redis, all Sensu components on one) Monday, October 14, 13
  17. 17. AS WE GREW Growing pains and lessons learned... Monday, October 14, 13
  18. 18. NEEDS MORE SENSU ✓High load on Sensu server Backed-up queues in RabbitMQ ✓ TIP: set up check to monitor the ✓RabbitMQ ready queue size, you'll want an email when the queue grows about 10K and stays there Monday, October 14, 13
  19. 19. HOW TO SCALE ✓Add more sensu-server instances No special configuration needed ✓ checks will be ✓robin fashion todistributed in roundthe sensu-servers Monday, October 14, 13
  20. 20. GRAPHITE PAINS symptoms: backed up queues in ✓RabbitMQ, spotty graphs cluster couldn’t with the ✓large amount of keep upwe were metrics now serving it via AMQP Monday, October 14, 13
  21. 21. GRAPHITE PAINS ✓ Solution: stop collecting metrics every 10 seconds (excessive!) ✓ moved staging metrics to staging Graphite cluster ✓ Moved prod Graphite cluster to SSD Monday, October 14, 13
  22. 22. THE MIGRATION or, How To Quit Nagios in Ten Easy Steps Monday, October 14, 13
  23. 23. STEP 1: NUKE AND PAVE Monday, October 14, 13
  24. 24. STEP 2: PLAN METRICS AND MONITORING SURVEY Monday, October 14, 13
  25. 25. METRICS AND MONITORING SURVEY Monday, October 14, 13
  26. 26. STEP 3: DEFINE GLOBALS ✓CHECKS: must be actionable! ✓METRICS: go nuts HANDLERS: EMAIL for everything ✓initially, added Pagerduty later. Monday, October 14, 13
  27. 27. OUR GLOBALS ✓ CHECKS: disk usage, swap usage, zombie processes, RO filesystems ✓ METRICS: vmstat, disk usage, cpu, memory, interface and disk perf ✓ HANDLERS: Email, Campfire, Pagerduty Monday, October 14, 13
  28. 28. STEP 4: DEFINE SPECIFICS ✓ For each server role, define additional states to be checked and alerted on: ✓Process Checks ✓System Checks ✓Service Checks ✓Service Metrics Monday, October 14, 13
  29. 29. STEP 5: SET UP A PLACE TO TEST ✓ Set up a permanent testing Sensu stack using your CM tool of choice ✓ Monday, October 14, 13 we used sensu-chef cookbook
  30. 30. STEP 6: SET A WORKFLOW ✓ Develop and document a workflow for implementing, testing, deploying and signing off on checks ✓ You’ll get the best coverage if anyone (developers or ops) can easily add checks and metrics to Sensu Monday, October 14, 13
  31. 31. EXAMPLE WORKFLOW add new sensu_check ✓appropriate cookbook definitions to the in Chef deploy ✓Chef new check to staging env using ✓Pull Request with sample graphs or alerts ✓Code Review from colleague ✓Deploy to Prod Monday, October 14, 13
  32. 32. SENSU IN CHEF Monday, October 14, 13
  33. 33. STEP 7: EXECUTE WORKFLOW Starting with the low-hanging ✓(plugins that already existed infruit sensu-community-plugins repository), configure and deploy each check in the worksheet to the testing Sensu server deploy sensu-client to a few select ✓machines Monday, October 14, 13
  34. 34. STEP 8: WATCH THE WATCHER Set up some bare-minimum 3rd ✓party monitoring for the Sensu servers ✓ We use Panopta’s agent to check for aliveness, disk usage and CPU usage. Monday, October 14, 13
  35. 35. Monday, October 14, 13
  36. 36. MONITOR THE MONITOR ✓ Other ideas: have Testing Sensu monitor Prod Sensu ✓ Sensu can collect metrics about itself Monday, October 14, 13
  37. 37. STEP 9: ROLLOUT Deploy your ✓infrastructureProduction server Roll out the client ✓the rest of the yourand checks to prod environments.  Monday, October 14, 13
  38. 38. STEP 10: TUNE ✓ Expect to need to tune ✓and alert occurrences. thresholds Laissez le bon alertes roulent! Monday, October 14, 13
  39. 39. SENSU ARCHITECTURE Monday, October 14, 13
  40. 40. SENSU ARCHITECTURE Monday, October 14, 13
  41. 41. OMNIBUS INSTALLER is awesome Monday, October 14, 13
  42. 42. LET’S PLAY WITH SENSU If you haven’t been able to get your sandboxes up and running, please pair with someone near you. Monday, October 14, 13
  43. 43. SANDBOX GOALS ✓ Get familiar with Sensu configuration ✓ ✓Deploy a check Trigger an alert on that check ✓ Give you something to take home ✓and hack on Install a Handler Monday, October 14, 13
  44. 44. OOPS If you mess anything up: vagrant halt; vagrant up Worst case: vagrant destroy; vagrant up Monday, October 14, 13
  45. 45. TWO VIRTUALBOXES Sensu-Server and Sensu-Client Vagrant/Chef Centos 6.4 Sensu Version 0.10.2 Monday, October 14, 13
  46. 46. SENSU CONFIGURATION Please open up a terminal ✓into both your sensu-serverand SSH and sensu-client VMs ✓sudo su ✓cd /etc/sensu Monday, October 14, 13
  47. 47. SENSU CONFIGURATION ✓/etc/sensu/config.json - config for redis, rabbitmq, api and dashboard ✓/etc/sensu/conf.d/ - checks go here ✓/etc/sensu/conf.d/client.json client configuration, subscriptions ✓ /etc/sensu/{extensions|handlers| mutators|plugins} Monday, October 14, 13
  48. 48. TRIGGER AN ALERT! On sensu-client: service sensu-client stop Monday, October 14, 13
  49. 49. CHECK YOUR DASHBOARD Open a web browser and ✓http://10.254.254.10:8080 go to username: ✓secret admin / password: Monday, October 14, 13
  50. 50. HANDLERS ✓ A HANDLER takes action on an event using a pipe, TCP, UDP, AMQP, or a set of other handlers Examples: send an send ✓event to Pagerduty,email,metrics to send Graphite ✓ Monday, October 14, 13 Default is “debug”
  51. 51. HANDLER EXAMPLES ✓BASIC: send an email to ops@ ADVANCED: attempt to remediate ✓the alert (i.e. run a custom script that spins up additional ec2 instances) Monday, October 14, 13
  52. 52. HANDLERS Let’s configure an EMAIL handler ✓to send a informative email for an event. ✓ /etc/sensu/handlers/mailer.rb plugin is installed for you, we just need to configure and install it Monday, October 14, 13
  53. 53. CONFIGURE THE PLUGIN ON SENSU SERVER: vim /etc/sensu/conf.d/handlers/ mailer.json { "mailer": { "mail_from": "sensu@you.com", "mail_to": "you@yourdomain.com" } } Monday, October 14, 13
  54. 54. CONFIGURE THE HANDLER cp /etc/sensu/conf.d/handlers/ default.json /etc/sensu/conf.d/handlers/ email.json vim /etc/sensu/conf.d/handlers/ email.json Monday, October 14, 13
  55. 55. EMAIL.JSON "handlers": { "email": { "type": "pipe", "command": "/etc/sensu/handlers/ mailer.rb" } } Monday, October 14, 13
  56. 56. CHECK GEM DEPENDENCIES /opt/sensu/embedded/bin/gem list | grep mail Monday, October 14, 13
  57. 57. FIX PERMISSIONS chown -R .sensu /etc/sensu/conf.d/ Monday, October 14, 13
  58. 58. RESTART SERVICES service sensu-server restart tail -100 /var/log/sensu/sensu-server.log | grep mail Monday, October 14, 13
  59. 59. CHECKS Sensu-client runs CHECKS that ✓defined and scheduled either are locally (standalone) or on the sensu-server (subscription). A CHECK sends a RESULT as ✓EVENT to a HANDLER - this an applies to anything - service checks, metrics, etc Monday, October 14, 13
  60. 60. CHECK EXECUTION ✓ Either scheduled by the server (subscription) or scheduled by the client (standalone) Today we will configure a ✓subscription-based check on the server that will run on our client Monday, October 14, 13
  61. 61. LETS CONFIGURE A CHECK ✓ Use check-procs.rb to make sure at least one instance of cornbread is running Monday, October 14, 13
  62. 62. DETERMINE OUR CHECK COMMAND On your SENSU CLIENT: /opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p cornbread -W1 Monday, October 14, 13
  63. 63. INSTALL OUR CHECK ✓On your SENSU SERVER: vim /etc/sensu/conf.d/checks/ ✓cornbread_process.json Monday, October 14, 13
  64. 64. CORNBREAD_PRO CESS.JSON Monday, October 14, 13
  65. 65. RESTART SERVICES service sensu-server restart tail -100 /var/log/sensu/sensu-server.log | grep cornbread Monday, October 14, 13
  66. 66. CHECK YOUR DASHBOARD Monday, October 14, 13
  67. 67. CHECK YOUR EMAIL Monday, October 14, 13
  68. 68. SENSU API ✓ ✓HTTP/4567 on SENSU SERVER try: ✓ REST API curl -l http://localhost:4567/events | python -mjson.tool Monday, October 14, 13
  69. 69. SENSU SERVICES ✓Sensu API Sensu Server ✓ ✓Sensu Client Sensu Dashboard ✓ Monday, October 14, 13
  70. 70. EVERYTHING OK? ✓ /etc/init.d/sensu-service {client| server|api|dashboard} {start|stop| status|restart} ✓ps -ef | grep sensu tail -f /var/log/sensu/*.log ✓ ✓curl -l localhost:4567/info Monday, October 14, 13
  71. 71. COOL SENSU TRICKS Monday, October 14, 13
  72. 72. SEND DIRECTLY TO SENSU netcat to: 127.0.0.0:3030 Monday, October 14, 13
  73. 73. AGGREGATE ALERTS ✓ Alert when ✓not OK X% of checks are are Handy for preventing alert floods Monday, October 14, 13
  74. 74. MY SENSU TIPS install the RabbitMQ management ✓web interface and bookmark it (see http://10.254.254.10:15672/#/ ) ✓ lock your plugins’ gem dependency versions Monday, October 14, 13
  75. 75. TIPS TIPS TIPS ✓ have alternate ways to access your Dashboard information ✓ we integrated our command-line developer tools with Sensu API ✓ we also created our own Ops dashboard that queries Sensu, Graphite and our app for data Monday, October 14, 13
  76. 76. MORE TIPS ✓ Put NGINX in front of sensudashboard Monday, October 14, 13
  77. 77. HA SENSU ✓ Redundancy is easy (bring up more sensu-servers) ✓ Making Redis and RabbitMQ HA more challenging ✓ We’re still running one solitary Redis and RabbitMQ but are OK with this risk for now Monday, October 14, 13
  78. 78. WHERE TO GO FOR HELP ✓ ✓IRC: #sensu - freenode sensu-users mailing list ✓ http://docs.sensuapp.org Monday, October 14, 13
  79. 79. QUESTIONS Monday, October 14, 13
  80. 80. THANK YOU bethany@paperlesspost.com @skymob - twitter robotwitharose - #sensu on IRC (freenode) Monday, October 14, 13

×