Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Order from chaos: automating monitoring configuration


Published on

In a high-performance computing shop with over 3,000 nodes, Harvard FAS Research Computing can’t afford chaos around our monitoring checks! In this Sensu Summit 2019 talk, you'll hear from Harvard SRE Molly Duggan about how they’re using CI/CD pipelines and the Sensu Go API to ensure that all changes to their monitoring system are validated, reproducible, and version controlled.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Order from chaos: automating monitoring configuration

  1. 1. Order from Chaos: Automating Monitoring Configuration Molly Duggan Harvard – FAS Research Computing
  2. 2. A Little Context ● 100,000 CPU cores on 3,000 nodes running 29 million jobs/year ● 40PB of storage on a variety of different systems ● 2 data centers ● 500+ lab groups with over 5500 users ● Cloud and VM infrastructure ○ Connected VMs for applications showing research data ○ Assorted DBs (researchers, museums) ○ Internal services (puppet, gitlab, etc) Everything that’s not compute is a snowflake!
  3. 3. What We Monitor
  4. 4. Options ● Manual configuration through a dashboard with backups ○ Too much of a free-for-all ○ Not easy to see changes ○ Too hard to roll back updates ● Config Management ○ Old puppet version ○ Too much in one place ● Script against sensuctl ○ Not everything we wanted was implemented at the time we began this project (during the beta) ○ Asset packaging needs
  5. 5. Hinoki ● A tiny command-line tool to manage Sensu configuration ● Advantages for our shop: ○ Discrete repo with audit trail ○ Easy contribution for everyone on the team - just a git push ○ Flexibility moving forward ● Equal parts code and convention with CI/CD integration ● Import definitions via Sensu API ● Quick-start provisioning of an empty cluster ● Ship and package assets, add hash to configs ● pip install hinoki
  6. 6. Demo #1: Updating Settings
  7. 7. Demo #2: Initialize New Cluster
  8. 8. Outstanding Issues ● Add more features! ● We still repeat ourselves ● It can still be confusing to understand what you might be touching if you alter a check
  9. 9. Thank You! Molly Duggan Harvard – FAS Research Computing Github: @exitquote