Monitoring a Vault and Consul cluster - 24th May 2018

Peter Souter
Peter SouterSenior Professional Services Engineer at Puppet
Copyright © 2018
HashiCorp
May 23, 2018
Monitoring a Vault and
Consul Cluster
“Technical Account Manager at HashiCorp
Peter Souter
Based in...
London, UK
Been using...
The HashiCorp stack about 7 years (Vagrant
FTW!)
Worn a lot of hats in my time...
Developer, Consultant, Pre-Sales, TAM
Interested in...
Making people’s operational life easier and
more secure
DEVOPS ALL THE THINGS
Introductions - Who is this person?
“▪ Consul is the main
recommended backend for Vault
▪ It allows Vault to have a proper
HA and DR story
▪ More info:
▪ https://www.vaultproject.io/guid
es/operations/vault-ha-consul.ht
ml
Vault and Consul - What a team!
“▪ Consul hit 1.0 last year!
▪ Vault is at 0.10… 1.0 is
coming “Sooner rather than
later” - Mitchell
▪ Other products “Soon”™
▪ Also, cool stuff is coming,
come to HashiDays
Amsterdam and HashiConf!
Maturing of Products
http://bit.do/barrels_image
“
▪Architecture diagrams
▪Scaling
▪Performance
▪Deployment Guides
▪Monitoring
With maturing comes operationalisation
“
Our research team is right now working on
Consul soaking and measuring at massive scale,
so if you’re hitting edge cases or have
information for us, we’d like to hear from you!
Come help us with Consul scaling research!
“
▪Architecture diagrams
▪Scaling
▪Performance
▪Deployment Guides
▪Monitoring
Today we’re going to focus on...
“
▪ Time-series telemetry data: This involves capturing metrics
from the application, storing them in a special database
designed for that purpose, and analyzing trends in the data
over time.
▪ Examples: Grafana, CloudWatch, DataDog, Circonus.
Time-series Telemetry Data
“
▪ Log analytics. This means capturing log files from the
system and the application, extracting useful signals
from the text, and then analyzing that data.
▪ Examples: Splunk, ELK, SumoLogic.
Log Analytics
“
▪ This involves active methods of connecting to the
application and interacting with it to ensure it is
responding properly.
▪ Examples: Nagios, Sensu, Keynote.
Active health checks
“▪ Vault and Consul use the go-metrics library to export telemetry.
▪ Currently they support the following options:
• Circonus
• DataDog's DogStatsd
• Statsite
• Statsd
▪ Note that DataDog's agent and Statsite are implementations of statsd, so the
last 3 options are nearly the same thing.
How do we get those metrics?
“
Where do they go?
▪ Once the metrics reach your statsd-compatible agent, they
need to be forwarded somewhere so they can be stored
and displayed. There are many options...
▪ For this demo we’re sticking to a TIGK Stack:
• Telegraf, InfluxDB, Grafana, Kapacitor
• (Normally that would be TICK, but Cronograf’s
dashboards are not as good as Grafana IMO)
“
Where do they go? - Architecture
“
Consul Telemetry - How?
https://www.consul.io/docs/agent/telemetry.html
➔ Two Entries:
◆ dogstatsd_addr: hostname and port of
the statsd daemon.
○ DogStatsd format instead of - tells
Consul to send tagswith each metric.
Tags can be used by Grafana to filter
data on your dashboards
◆ disable_hostname: true
◆ Tells Consul not to insert the hostname in
the names of the metrics it sends to
statsd, since the hostnames will be sent
as tags.
○ Without this option, the single metric
consul.raft.apply would become
multiple metrics
{
"telemetry": {
"dogstatsd_addr": "localhost:8125",
"disable_hostname": true
}
}
“
Vault Telemetry - How?
https://www.vaultproject.io/docs/configuration/telemetry.html
Pretty much the same!
telemetry {
dogstatsd_addr = "localhost:8125"
disable_hostname = true
}
“
Consul Telemetry - What?
▪ Consul has 86 different
metrics
▪ That’s good but… which
do I need to look at?
▪ And what’s the threshold
before I should get
worried?
▪ Halp
https://www.consul.io/docs/agent/telemetry.html
“
Consul Telemetry - Transaction Timing
Metric Name Description
consul.kvs.apply This measures the time it takes to complete an
update to the KV store.
consul.txn.apply This measures the time spent applying a
transaction operation.
consul.raft.apply This counts the number of Raft transactions
occurring over the interval.
consul.raft.commitTime This measures the time it takes to commit a new
entry to the Raft log on the leader.
Why they're important: Taken together, these metrics indicate how long it takes to complete write operations in
various parts of the Consul cluster. Generally these should all be fairly consistent and no more than a few
milliseconds. Sudden changes in any of the timing values could be due to unexpected load on the Consul servers, or
due to problems on the servers themselves.
What to look for: Deviations (in any of these metrics) of more than 50% from baseline over the previous hour.
“
Vault Telemetry - Seal Status
Metric Name Description
consul_health_checks[check_name="Vault Sealed Status"].passing Value of 1 indicates Vault is unsealed;
0 means sealed.
Why they're important: By default, Vault is sealed on startup, so if this value
changes to 0 during the day, Vault has restarted for some reason. And until it's
unsealed, it won't answer requests from clients.
What to look for: A value of 0 being reported by any host.
NOTE: This metric is actually reported by the Consul plugin to Telegraf.
Copyright © 2018 HashiCorp
We’re working on
guide-ifying this!
Copyright © 2018 HashiCorp
Demo
Copyright © 2018 HashiCorp
😞
Copyright © 2018 HashiCorp
Copyright © 2018 HashiCorp
Q&A
1 of 23

Recommended

Head in the Clouds: Testing Infra as Code - Config Management 2020 by
Head in the Clouds: Testing Infra as Code - Config Management 2020Head in the Clouds: Testing Infra as Code - Config Management 2020
Head in the Clouds: Testing Infra as Code - Config Management 2020Peter Souter
474 views138 slides
Inspec one tool to rule them all by
Inspec one tool to rule them allInspec one tool to rule them all
Inspec one tool to rule them allKimball Johnson
374 views33 slides
Testing Terraform by
Testing TerraformTesting Terraform
Testing TerraformNathen Harvey
1.8K views48 slides
Advanced Weapons Training for the Empire by
Advanced Weapons Training for the EmpireAdvanced Weapons Training for the Empire
Advanced Weapons Training for the EmpireJeremy Johnson
3K views54 slides
FOSDEM 2012: Practical implementation of promise theory in CFEngine by
FOSDEM 2012: Practical implementation of promise theory in CFEngineFOSDEM 2012: Practical implementation of promise theory in CFEngine
FOSDEM 2012: Practical implementation of promise theory in CFEnginedottedmag
2.3K views46 slides
HashiCorp Vault Plugin Infrastructure by
HashiCorp Vault Plugin InfrastructureHashiCorp Vault Plugin Infrastructure
HashiCorp Vault Plugin InfrastructureNicolas Corrarello
650 views25 slides

More Related Content

What's hot

Trying Continuous Delivery - pyconjp 2012 by
Trying Continuous Delivery - pyconjp 2012Trying Continuous Delivery - pyconjp 2012
Trying Continuous Delivery - pyconjp 2012Toru Furukawa
1.5K views35 slides
A Hands-on Introduction on Terraform Best Concepts and Best Practices by
A Hands-on Introduction on Terraform Best Concepts and Best Practices A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices Nebulaworks
297 views46 slides
PSGI and Plack from first principles by
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principlesPerl Careers
2K views42 slides
Node.js cluster by
Node.js clusterNode.js cluster
Node.js clusterDerek Willian Stavis
621 views52 slides
Deploying Plack Web Applications: OSCON 2011 by
Deploying Plack Web Applications: OSCON 2011Deploying Plack Web Applications: OSCON 2011
Deploying Plack Web Applications: OSCON 2011Tatsuhiko Miyagawa
8.2K views143 slides
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks by
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksHow to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksCarlos Sanchez
1.5K views72 slides

What's hot(20)

Trying Continuous Delivery - pyconjp 2012 by Toru Furukawa
Trying Continuous Delivery - pyconjp 2012Trying Continuous Delivery - pyconjp 2012
Trying Continuous Delivery - pyconjp 2012
Toru Furukawa1.5K views
A Hands-on Introduction on Terraform Best Concepts and Best Practices by Nebulaworks
A Hands-on Introduction on Terraform Best Concepts and Best Practices A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices
Nebulaworks297 views
PSGI and Plack from first principles by Perl Careers
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principles
Perl Careers2K views
Deploying Plack Web Applications: OSCON 2011 by Tatsuhiko Miyagawa
Deploying Plack Web Applications: OSCON 2011Deploying Plack Web Applications: OSCON 2011
Deploying Plack Web Applications: OSCON 2011
Tatsuhiko Miyagawa8.2K views
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks by Carlos Sanchez
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksHow to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks
Carlos Sanchez1.5K views
Advanced VCL: how to use restart by Fastly
Advanced VCL: how to use restartAdvanced VCL: how to use restart
Advanced VCL: how to use restart
Fastly3.8K views
Testing your infrastructure with litmus by Bram Vogelaar
Testing your infrastructure with litmusTesting your infrastructure with litmus
Testing your infrastructure with litmus
Bram Vogelaar78 views
PostgreSQL High-Availability and Geographic Locality using consul by Sean Chittenden
PostgreSQL High-Availability and Geographic Locality using consulPostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consul
Sean Chittenden4.7K views
Going crazy with Varnish and Symfony by David de Boer
Going crazy with Varnish and SymfonyGoing crazy with Varnish and Symfony
Going crazy with Varnish and Symfony
David de Boer1.5K views
Securing Prometheus exporters using HashiCorp Vault by Bram Vogelaar
Securing Prometheus exporters using HashiCorp VaultSecuring Prometheus exporters using HashiCorp Vault
Securing Prometheus exporters using HashiCorp Vault
Bram Vogelaar397 views
Terraform - Taming Modern Clouds by Nic Jackson
Terraform  - Taming Modern CloudsTerraform  - Taming Modern Clouds
Terraform - Taming Modern Clouds
Nic Jackson383 views
Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ... by James Wickett
Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...
Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...
James Wickett754 views
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich... by Luciano Mammino
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
Luciano Mammino2.8K views
VCL template abstraction model and automated deployments to Fastly by Fastly
VCL template abstraction model and automated deployments to FastlyVCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to Fastly
Fastly1.2K views
An introduction to Raku by Simon Proctor
An introduction to RakuAn introduction to Raku
An introduction to Raku
Simon Proctor2.9K views
Javascript TDD with Jasmine, Karma, and Gulp by All Things Open
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and Gulp
All Things Open4.6K views
"Swoole: double troubles in c", Alexandr Vronskiy by Fwdays
"Swoole: double troubles in c", Alexandr Vronskiy"Swoole: double troubles in c", Alexandr Vronskiy
"Swoole: double troubles in c", Alexandr Vronskiy
Fwdays872 views

Similar to Monitoring a Vault and Consul cluster - 24th May 2018

Monitoring your Python with Prometheus (Python Ireland April 2015) by
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil
17.3K views47 slides
Full Consistency Lag and its Applications by
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsCassandra Austin
250 views31 slides
The hardest part of microservices: your data by
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your dataChristian Posta
21.4K views89 slides
Nelson: Rigorous Deployment for a Functional World by
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldTimothy Perrett
1.5K views79 slides
Implementing Progressive Delivery with Your Team (by Leigh Capili) by
Implementing Progressive Delivery with Your Team (by Leigh Capili)Implementing Progressive Delivery with Your Team (by Leigh Capili)
Implementing Progressive Delivery with Your Team (by Leigh Capili)Weaveworks
138 views24 slides
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te... by
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...DataStax
811 views19 slides

Similar to Monitoring a Vault and Consul cluster - 24th May 2018(20)

Monitoring your Python with Prometheus (Python Ireland April 2015) by Brian Brazil
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
Brian Brazil17.3K views
Full Consistency Lag and its Applications by Cassandra Austin
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its Applications
Cassandra Austin250 views
The hardest part of microservices: your data by Christian Posta
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
Christian Posta21.4K views
Nelson: Rigorous Deployment for a Functional World by Timothy Perrett
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional World
Timothy Perrett1.5K views
Implementing Progressive Delivery with Your Team (by Leigh Capili) by Weaveworks
Implementing Progressive Delivery with Your Team (by Leigh Capili)Implementing Progressive Delivery with Your Team (by Leigh Capili)
Implementing Progressive Delivery with Your Team (by Leigh Capili)
Weaveworks138 views
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te... by DataStax
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
DataStax811 views
PHX DevOps Days: Service Mesh Landscape by Christian Posta
PHX DevOps Days: Service Mesh LandscapePHX DevOps Days: Service Mesh Landscape
PHX DevOps Days: Service Mesh Landscape
Christian Posta736 views
ConFoo Montreal - Approaches for application request throttling by Maarten Balliauw
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
Maarten Balliauw1.2K views
Approaches to application request throttling by Maarten Balliauw
Approaches to application request throttlingApproaches to application request throttling
Approaches to application request throttling
Maarten Balliauw1.6K views
The Hardest Part of Microservices: Calling Your Services by Christian Posta
The Hardest Part of Microservices: Calling Your ServicesThe Hardest Part of Microservices: Calling Your Services
The Hardest Part of Microservices: Calling Your Services
Christian Posta2.6K views
API World: The service-mesh landscape by Christian Posta
API World: The service-mesh landscapeAPI World: The service-mesh landscape
API World: The service-mesh landscape
Christian Posta851 views
The hitchhiker’s guide to Prometheus by Bol.com Techlab
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Bol.com Techlab2.7K views
The hitchhiker’s guide to Prometheus by Bol.com Techlab
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Bol.com Techlab512 views
Approaches for application request throttling - dotNetCologne by Maarten Balliauw
Approaches for application request throttling - dotNetCologneApproaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologne
Maarten Balliauw246 views
Docker Logging and analysing with Elastic Stack by Jakub Hajek
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
Jakub Hajek157 views
Docker Logging and analysing with Elastic Stack - Jakub Hajek by PROIDEA
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
PROIDEA73 views
Service discovery like a pro (presented at reversimX) by Eran Harel
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
Eran Harel318 views

More from Peter Souter

I don't know what I'm Doing: A newbie guide for Golang for DevOps by
I don't know what I'm Doing: A newbie guide for Golang for DevOpsI don't know what I'm Doing: A newbie guide for Golang for DevOps
I don't know what I'm Doing: A newbie guide for Golang for DevOpsPeter Souter
486 views49 slides
Consul Connect - EPAM SEC - 22nd september 2018 by
Consul Connect - EPAM SEC - 22nd september 2018Consul Connect - EPAM SEC - 22nd september 2018
Consul Connect - EPAM SEC - 22nd september 2018Peter Souter
366 views57 slides
Maintaining Layer 8 by
Maintaining Layer 8Maintaining Layer 8
Maintaining Layer 8Peter Souter
490 views62 slides
Knee deep in the undef - Tales from refactoring old Puppet codebases by
Knee deep in the undef  - Tales from refactoring old Puppet codebasesKnee deep in the undef  - Tales from refactoring old Puppet codebases
Knee deep in the undef - Tales from refactoring old Puppet codebasesPeter Souter
590 views54 slides
Compliance and auditing with Puppet by
Compliance and auditing with PuppetCompliance and auditing with Puppet
Compliance and auditing with PuppetPeter Souter
3.9K views77 slides
Lock it down by
Lock it downLock it down
Lock it downPeter Souter
1.1K views87 slides

More from Peter Souter(10)

I don't know what I'm Doing: A newbie guide for Golang for DevOps by Peter Souter
I don't know what I'm Doing: A newbie guide for Golang for DevOpsI don't know what I'm Doing: A newbie guide for Golang for DevOps
I don't know what I'm Doing: A newbie guide for Golang for DevOps
Peter Souter486 views
Consul Connect - EPAM SEC - 22nd september 2018 by Peter Souter
Consul Connect - EPAM SEC - 22nd september 2018Consul Connect - EPAM SEC - 22nd september 2018
Consul Connect - EPAM SEC - 22nd september 2018
Peter Souter366 views
Knee deep in the undef - Tales from refactoring old Puppet codebases by Peter Souter
Knee deep in the undef  - Tales from refactoring old Puppet codebasesKnee deep in the undef  - Tales from refactoring old Puppet codebases
Knee deep in the undef - Tales from refactoring old Puppet codebases
Peter Souter590 views
Compliance and auditing with Puppet by Peter Souter
Compliance and auditing with PuppetCompliance and auditing with Puppet
Compliance and auditing with Puppet
Peter Souter3.9K views
Hardening Your Config Management - Security and Attack Vectors in Config Mana... by Peter Souter
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Peter Souter1K views
Puppet module anti patterns by Peter Souter
Puppet module anti patternsPuppet module anti patterns
Puppet module anti patterns
Peter Souter782 views
Little Puppet Tools To Make Your Life Better by Peter Souter
Little Puppet Tools To Make Your Life BetterLittle Puppet Tools To Make Your Life Better
Little Puppet Tools To Make Your Life Better
Peter Souter536 views
Testing servers like software by Peter Souter
Testing servers like softwareTesting servers like software
Testing servers like software
Peter Souter988 views

Recently uploaded

TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
10 views29 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
15 views161 slides
MVP and prioritization.pdf by
MVP and prioritization.pdfMVP and prioritization.pdf
MVP and prioritization.pdfrahuldharwal141
31 views8 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides
Vertical User Stories by
Vertical User StoriesVertical User Stories
Vertical User StoriesMoisés Armani Ramírez
14 views16 slides
HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
22 views151 slides

Recently uploaded(20)

TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc10 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10248 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views

Monitoring a Vault and Consul cluster - 24th May 2018

  • 1. Copyright © 2018 HashiCorp May 23, 2018 Monitoring a Vault and Consul Cluster
  • 2. “Technical Account Manager at HashiCorp Peter Souter Based in... London, UK Been using... The HashiCorp stack about 7 years (Vagrant FTW!) Worn a lot of hats in my time... Developer, Consultant, Pre-Sales, TAM Interested in... Making people’s operational life easier and more secure DEVOPS ALL THE THINGS Introductions - Who is this person?
  • 3. “▪ Consul is the main recommended backend for Vault ▪ It allows Vault to have a proper HA and DR story ▪ More info: ▪ https://www.vaultproject.io/guid es/operations/vault-ha-consul.ht ml Vault and Consul - What a team!
  • 4. “▪ Consul hit 1.0 last year! ▪ Vault is at 0.10… 1.0 is coming “Sooner rather than later” - Mitchell ▪ Other products “Soon”™ ▪ Also, cool stuff is coming, come to HashiDays Amsterdam and HashiConf! Maturing of Products http://bit.do/barrels_image
  • 6. “ Our research team is right now working on Consul soaking and measuring at massive scale, so if you’re hitting edge cases or have information for us, we’d like to hear from you! Come help us with Consul scaling research!
  • 8. “ ▪ Time-series telemetry data: This involves capturing metrics from the application, storing them in a special database designed for that purpose, and analyzing trends in the data over time. ▪ Examples: Grafana, CloudWatch, DataDog, Circonus. Time-series Telemetry Data
  • 9. “ ▪ Log analytics. This means capturing log files from the system and the application, extracting useful signals from the text, and then analyzing that data. ▪ Examples: Splunk, ELK, SumoLogic. Log Analytics
  • 10. “ ▪ This involves active methods of connecting to the application and interacting with it to ensure it is responding properly. ▪ Examples: Nagios, Sensu, Keynote. Active health checks
  • 11. “▪ Vault and Consul use the go-metrics library to export telemetry. ▪ Currently they support the following options: • Circonus • DataDog's DogStatsd • Statsite • Statsd ▪ Note that DataDog's agent and Statsite are implementations of statsd, so the last 3 options are nearly the same thing. How do we get those metrics?
  • 12. “ Where do they go? ▪ Once the metrics reach your statsd-compatible agent, they need to be forwarded somewhere so they can be stored and displayed. There are many options... ▪ For this demo we’re sticking to a TIGK Stack: • Telegraf, InfluxDB, Grafana, Kapacitor • (Normally that would be TICK, but Cronograf’s dashboards are not as good as Grafana IMO)
  • 13. “ Where do they go? - Architecture
  • 14. “ Consul Telemetry - How? https://www.consul.io/docs/agent/telemetry.html ➔ Two Entries: ◆ dogstatsd_addr: hostname and port of the statsd daemon. ○ DogStatsd format instead of - tells Consul to send tagswith each metric. Tags can be used by Grafana to filter data on your dashboards ◆ disable_hostname: true ◆ Tells Consul not to insert the hostname in the names of the metrics it sends to statsd, since the hostnames will be sent as tags. ○ Without this option, the single metric consul.raft.apply would become multiple metrics { "telemetry": { "dogstatsd_addr": "localhost:8125", "disable_hostname": true } }
  • 15. “ Vault Telemetry - How? https://www.vaultproject.io/docs/configuration/telemetry.html Pretty much the same! telemetry { dogstatsd_addr = "localhost:8125" disable_hostname = true }
  • 16. “ Consul Telemetry - What? ▪ Consul has 86 different metrics ▪ That’s good but… which do I need to look at? ▪ And what’s the threshold before I should get worried? ▪ Halp https://www.consul.io/docs/agent/telemetry.html
  • 17. “ Consul Telemetry - Transaction Timing Metric Name Description consul.kvs.apply This measures the time it takes to complete an update to the KV store. consul.txn.apply This measures the time spent applying a transaction operation. consul.raft.apply This counts the number of Raft transactions occurring over the interval. consul.raft.commitTime This measures the time it takes to commit a new entry to the Raft log on the leader. Why they're important: Taken together, these metrics indicate how long it takes to complete write operations in various parts of the Consul cluster. Generally these should all be fairly consistent and no more than a few milliseconds. Sudden changes in any of the timing values could be due to unexpected load on the Consul servers, or due to problems on the servers themselves. What to look for: Deviations (in any of these metrics) of more than 50% from baseline over the previous hour.
  • 18. “ Vault Telemetry - Seal Status Metric Name Description consul_health_checks[check_name="Vault Sealed Status"].passing Value of 1 indicates Vault is unsealed; 0 means sealed. Why they're important: By default, Vault is sealed on startup, so if this value changes to 0 during the day, Vault has restarted for some reason. And until it's unsealed, it won't answer requests from clients. What to look for: A value of 0 being reported by any host. NOTE: This metric is actually reported by the Consul plugin to Telegraf.
  • 19. Copyright © 2018 HashiCorp We’re working on guide-ifying this!
  • 20. Copyright © 2018 HashiCorp Demo
  • 21. Copyright © 2018 HashiCorp 😞
  • 22. Copyright © 2018 HashiCorp
  • 23. Copyright © 2018 HashiCorp Q&A