Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monitoring a Vault and Consul cluster - 24th May 2018


Published on

Vault is an open source solution for identity and secrets management. Vault is well suited for both public cloud and private datacenter usage, but a common challenge is securely running Vault and accessing secrets in public cloud. This talk will show how to securely run Vault in the cloud, and be able to access those secrets securely from multiple differing cloud platforms. Additionally, the Vault 0.10 release is right around the corner and includes some major changes to improve the lives of both beginners and advanced users of Vault. We’ll spend some time looking at the latest features in Vault, and use these throughout the talk.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Monitoring a Vault and Consul cluster - 24th May 2018

  1. 1. Copyright © 2018 HashiCorp May 23, 2018 Monitoring a Vault and Consul Cluster
  2. 2. “Technical Account Manager at HashiCorp Peter Souter Based in... London, UK Been using... The HashiCorp stack about 7 years (Vagrant FTW!) Worn a lot of hats in my time... Developer, Consultant, Pre-Sales, TAM Interested in... Making people’s operational life easier and more secure DEVOPS ALL THE THINGS Introductions - Who is this person?
  3. 3. “▪ Consul is the main recommended backend for Vault ▪ It allows Vault to have a proper HA and DR story ▪ More info: ▪ es/operations/ ml Vault and Consul - What a team!
  4. 4. “▪ Consul hit 1.0 last year! ▪ Vault is at 0.10… 1.0 is coming “Sooner rather than later” - Mitchell ▪ Other products “Soon”™ ▪ Also, cool stuff is coming, come to HashiDays Amsterdam and HashiConf! Maturing of Products
  5. 5. “ ▪Architecture diagrams ▪Scaling ▪Performance ▪Deployment Guides ▪Monitoring With maturing comes operationalisation
  6. 6. “ Our research team is right now working on Consul soaking and measuring at massive scale, so if you’re hitting edge cases or have information for us, we’d like to hear from you! Come help us with Consul scaling research!
  7. 7. “ ▪Architecture diagrams ▪Scaling ▪Performance ▪Deployment Guides ▪Monitoring Today we’re going to focus on...
  8. 8. “ ▪ Time-series telemetry data: This involves capturing metrics from the application, storing them in a special database designed for that purpose, and analyzing trends in the data over time. ▪ Examples: Grafana, CloudWatch, DataDog, Circonus. Time-series Telemetry Data
  9. 9. “ ▪ Log analytics. This means capturing log files from the system and the application, extracting useful signals from the text, and then analyzing that data. ▪ Examples: Splunk, ELK, SumoLogic. Log Analytics
  10. 10. “ ▪ This involves active methods of connecting to the application and interacting with it to ensure it is responding properly. ▪ Examples: Nagios, Sensu, Keynote. Active health checks
  11. 11. “▪ Vault and Consul use the go-metrics library to export telemetry. ▪ Currently they support the following options: • Circonus • DataDog's DogStatsd • Statsite • Statsd ▪ Note that DataDog's agent and Statsite are implementations of statsd, so the last 3 options are nearly the same thing. How do we get those metrics?
  12. 12. “ Where do they go? ▪ Once the metrics reach your statsd-compatible agent, they need to be forwarded somewhere so they can be stored and displayed. There are many options... ▪ For this demo we’re sticking to a TIGK Stack: • Telegraf, InfluxDB, Grafana, Kapacitor • (Normally that would be TICK, but Cronograf’s dashboards are not as good as Grafana IMO)
  13. 13. “ Where do they go? - Architecture
  14. 14. “ Consul Telemetry - How? ➔ Two Entries: ◆ dogstatsd_addr: hostname and port of the statsd daemon. ○ DogStatsd format instead of - tells Consul to send tagswith each metric. Tags can be used by Grafana to filter data on your dashboards ◆ disable_hostname: true ◆ Tells Consul not to insert the hostname in the names of the metrics it sends to statsd, since the hostnames will be sent as tags. ○ Without this option, the single metric consul.raft.apply would become multiple metrics { "telemetry": { "dogstatsd_addr": "localhost:8125", "disable_hostname": true } }
  15. 15. “ Vault Telemetry - How? Pretty much the same! telemetry { dogstatsd_addr = "localhost:8125" disable_hostname = true }
  16. 16. “ Consul Telemetry - What? ▪ Consul has 86 different metrics ▪ That’s good but… which do I need to look at? ▪ And what’s the threshold before I should get worried? ▪ Halp
  17. 17. “ Consul Telemetry - Transaction Timing Metric Name Description consul.kvs.apply This measures the time it takes to complete an update to the KV store. consul.txn.apply This measures the time spent applying a transaction operation. consul.raft.apply This counts the number of Raft transactions occurring over the interval. consul.raft.commitTime This measures the time it takes to commit a new entry to the Raft log on the leader. Why they're important: Taken together, these metrics indicate how long it takes to complete write operations in various parts of the Consul cluster. Generally these should all be fairly consistent and no more than a few milliseconds. Sudden changes in any of the timing values could be due to unexpected load on the Consul servers, or due to problems on the servers themselves. What to look for: Deviations (in any of these metrics) of more than 50% from baseline over the previous hour.
  18. 18. “ Vault Telemetry - Seal Status Metric Name Description consul_health_checks[check_name="Vault Sealed Status"].passing Value of 1 indicates Vault is unsealed; 0 means sealed. Why they're important: By default, Vault is sealed on startup, so if this value changes to 0 during the day, Vault has restarted for some reason. And until it's unsealed, it won't answer requests from clients. What to look for: A value of 0 being reported by any host. NOTE: This metric is actually reported by the Consul plugin to Telegraf.
  19. 19. Copyright © 2018 HashiCorp We’re working on guide-ifying this!
  20. 20. Copyright © 2018 HashiCorp Demo
  21. 21. Copyright © 2018 HashiCorp 😞
  22. 22. Copyright © 2018 HashiCorp
  23. 23. Copyright © 2018 HashiCorp Q&A