Observing the HashiCorp Ecosystem From Prometheus

Julien Pivotto
Julien PivottoOpen Source Consultant at Inuits
Observing the HashiCorp Ecosystem From Prometheus
Kris Buytaert & Julien Pivotto
June 21, 2022
O11y
Who are we ?
O11y 0
Kris Buytaert
• I used to be a developer
• Then I became an Ops person
• Chief Trolling/Travel/Technical Officer @ Inuits.eu
• Chief Yak Shaver @ o11y.eu
• Organiser of #devopsdays, #cfgmgmtcamp, #loadays, ...
• Cofounder of all of the above
• Everything is a Freaking DNS Problem
• DNS : devops needs sushi
• @krisbuytaert on twitter/github
O11y 1
Julien Pivotto
• Prometheus maintainer
• Open Source Observability Expert
• Principal Software Architect & CoFounder @ o11y.eu
• DevOps believer
• @roidelapluie on twitter/github
O11y 2
O11y
• Inuits.eu Spinoff
• Open Source Observability
• Currently supporting the Prometheus Ecosystem
• Professional Services & Support (now)
• Long Term Enterprise Support (next month)
• Prometheus Distribution (soon)
O11y 3
Introduction, a brief history of Open Source Monitoring
O11y 3
July 2008 Ottawa Linux Symposium Paper
• Bloated Java Tools
• Dysfunctional Open Core Software
• DBA Required
• Nagios was king in the Open Source world
O11y 4
June 2011 #monitoringsucks
• John Vincent (@lusis) , june 2011
• A #devops sub-movement
• (manual configuration, not in sync with reality, hosts only, services sometimes,
applications never)
O11y 5
October 2011 #monitoringlove
• Ulf Mansson, #devopsdays Rome 2011
• A new found love for monitoring
• Triggered by { New Open Source Tools * Automation }
O11y 6
November 2012 Prometheus
O11y 7
What is monitoring?
• High level overview of the state of a service/component
• Availability
• Technical components
• Performance ?
What is going on?
O11y 8
Pitfalls of traditional monitoring
• Drift from reality
• Total lack of automation
• Total lack of automation
• Total lack of automation
• Total lack of automation
• Partial automation
• Lots of work to maintain
• Binary states: it works - it does not work
• Alert fatigue
• Alert fatigue
• Alert fatigue
• Alert fatigue
O11y 9
What is observability?
• Understand how your services behave
• Like you are at their place
• Without incident specific code
Why is this going on?
O11y 10
How do monitoring and observability connect?
• Monitoring is required
• If lucky, monitoring is enough
• Observability is removing luck <- @roidelapluie
O11y 11
What is observability - in Practice?
Three pillars:
• Metrics
• Logs
• Traces
O11y 12
Metrics
https:/
/play.grafana.org/
O11y 13
Logs
https:/
/play.grafana.org/
O11y 14
Traces
https:/
/www.jaegertracing.io/
O11y 15
Prometheus
O11y 15
Prometheus
• Prometheus is an Open Source CNCF Project
• Collects and stores metrics
• Pull-based
• Service discovery (including Consul)
• Alerting
O11y 16
The Prometheus ecosystem
• Exporters for every piece of the infra
• Maintained by multiple companies
• Long-Term Support release coming Q3 2022
O11y 17
Prometheus data model
• Metric have labels
• Labels differentiate metrics, e.g.:
• HTTP response code
• Datacenter name
O11y 18
PromQL
• Prometheus Query Language
• Powerful yet simple query language
rate(http_requests_total[5m])
O11y 19
Prometheus + Consul
O11y 19
Observing your services
• consul_sd_configs
• Stream consul services list to Prometheus
• Up-to-date service list
• Use the flexibility of labels
• Add relevant labels
• Filter targets
O11y 20
consul_sd_configs labels
• __meta_consul_service
• __meta_consul_tags
• __meta_consul_node
• __meta_consul_service_metadata_
• __meta_consul_dc
O11y 21
Alerting philosophy
• Page on actionable critical failure
• Avoid paging on Consul Health Check failure
• Keep “ambiance” alerts to get the atmosphere and quickly find the cause
O11y 22
Consul
O11y 22
consul_exporter
• Exporter maintained by Prometheus team
• Expose consul cluster health
• Optionally expose key/values
• e.g. store desired state in KV for graphing
• Connect to a single instance
O11y 23
Consul telemetry
• Built-in
• Runtime metrics (memory, CPU, ...)
• Autopilot, raft metrics
• Calls (rate, errors, latency)
O11y 24
Configure Consul telemetry
Consul configuration:
telemetry {
disable_hostname = true
prometheus_retention_time = "1h"
}
O11y 25
Configure Consul telemetry
Prometheus configuration:
scrape_jobs:
- name: consul
static_configs:
- <consulserver1>:8500
- <consulserver2>:8500
metrics_path: '/v1/agent/metrics'
param:
format: ["prometheus"]
O11y 26
Consul alerts (consul_exporter)
Is consul running?
up{job="consul_exporter"} == 0
consul_up{job="consul_exporter"} == 0
Is there a leader?
consul_raft_leader != 1
Are peers in raft?
sum(consul_raft_peers) != count(up{job="consul"})
O11y 27
Consul alerts (Consul telemetry)
Is consul running?
up{job="consul"} == 0
Is my cluster healthy?
consul_autopilot_healthy == 0
O11y 28
Vault
O11y 28
Configure Vault telemetry
Vault configuration:
telemetry {
disable_hostname = true
prometheus_retention_time = "1h"
}
O11y 29
Configure Consul telemetry
Prometheus configuration:
scrape_jobs:
- name: vault
static_configs:
- <vaultserver1>:8200
- <vaultserver2>:8200
metrics_path: '/v1/sys/metrics'
param:
format: ["prometheus"]
O11y 30
Vault alerting
Is Vault up?
up{job="vault"} == 0
Is Vault sealed?
vault_core_unsealed == 0
Is audit log working?
rate(vault_audit_log_request_failure[5m]) > 0
rate(vault_audit_log_response_failure[5m]) > 0
O11y 31
Alertmanager
O11y 31
Alert inhibition
• Suppressing notification from alerts of other alerts are firing.
• Reduces alerts, e.g. if vault is sealed.
O11y 32
Configuring inhibition
Alertmanager configuration:
inhibit_rules:
- source_match:
alertname: VaultIsSealed
target_match:
alertname: ErrorRateTooHigh
equal: [ datacenter ]
O11y 33
Conclusion
O11y 33
Conclusion
• Alerting should come from your end services
• Consul & Vault focused alerts will pinpoint causes
• Specific Vault & Consul alerts can page you (e.g. sealed)
• Draft dashboards based on your needs (response times, errors, etc)
O11y 34
Contact
O11y
https:/
/o11y.eu
info@o11y.eu
O11y 34
1 of 44

Recommended

OSMC 2022 | What’s new in the Prometheus ecosystem? by Julien Pivotto by
OSMC 2022 | What’s new in the Prometheus ecosystem? by Julien PivottoOSMC 2022 | What’s new in the Prometheus ecosystem? by Julien Pivotto
OSMC 2022 | What’s new in the Prometheus ecosystem? by Julien PivottoNETWAYS
58 views42 slides
What's New in Prometheus and Its Ecosystem by
What's New in Prometheus and Its EcosystemWhat's New in Prometheus and Its Ecosystem
What's New in Prometheus and Its EcosystemJulien Pivotto
12 views42 slides
What's New in Prometheus and Its Ecosystem by
What's New in Prometheus and Its EcosystemWhat's New in Prometheus and Its Ecosystem
What's New in Prometheus and Its EcosystemJulien Pivotto
12 views42 slides
Monitoring in a fast-changing world with Prometheus by
Monitoring in a fast-changing world with PrometheusMonitoring in a fast-changing world with Prometheus
Monitoring in a fast-changing world with PrometheusJulien Pivotto
33 views65 slides
Observability will not fix your Broken Monitoring ,Ignite by
Observability will not fix your Broken Monitoring ,IgniteObservability will not fix your Broken Monitoring ,Ignite
Observability will not fix your Broken Monitoring ,IgniteKris Buytaert
167 views20 slides
OSMC 2022 | IGNITE: The O11y toolkit by Julien Pivotto by
OSMC 2022 | IGNITE: The O11y toolkit by Julien PivottoOSMC 2022 | IGNITE: The O11y toolkit by Julien Pivotto
OSMC 2022 | IGNITE: The O11y toolkit by Julien PivottoNETWAYS
18 views20 slides

More Related Content

Similar to Observing the HashiCorp Ecosystem From Prometheus

OCCI status update by
OCCI status updateOCCI status update
OCCI status updatebefreax
952 views14 slides
OpenStack monitoring - Unidata S.p.A. Case Report by
OpenStack monitoring - Unidata S.p.A. Case ReportOpenStack monitoring - Unidata S.p.A. Case Report
OpenStack monitoring - Unidata S.p.A. Case ReportDavide Guerri
2.1K views34 slides
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io by
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv
309 views21 slides
The State of Logging on Docker by
The State of Logging on DockerThe State of Logging on Docker
The State of Logging on DockerTrevor Parsons
1.4K views23 slides
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff by
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff mfrancis
1.7K views15 slides
Mastering Terraform and the Provider for OCI by
Mastering Terraform and the Provider for OCIMastering Terraform and the Provider for OCI
Mastering Terraform and the Provider for OCIGregory GUILLOU
279 views31 slides

Similar to Observing the HashiCorp Ecosystem From Prometheus(20)

OCCI status update by befreax
OCCI status updateOCCI status update
OCCI status update
befreax952 views
OpenStack monitoring - Unidata S.p.A. Case Report by Davide Guerri
OpenStack monitoring - Unidata S.p.A. Case ReportOpenStack monitoring - Unidata S.p.A. Case Report
OpenStack monitoring - Unidata S.p.A. Case Report
Davide Guerri2.1K views
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io by DevOpsDays Tel Aviv
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
The State of Logging on Docker by Trevor Parsons
The State of Logging on DockerThe State of Logging on Docker
The State of Logging on Docker
Trevor Parsons1.4K views
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff by mfrancis
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
mfrancis1.7K views
Mastering Terraform and the Provider for OCI by Gregory GUILLOU
Mastering Terraform and the Provider for OCIMastering Terraform and the Provider for OCI
Mastering Terraform and the Provider for OCI
Gregory GUILLOU279 views
Monitoring Containerized Micro-Services In Azure by Alex Bulankou
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
Alex Bulankou431 views
Open Source Infrastructure / Development & Security > How to make it work? by Kangaroot
Open Source Infrastructure / Development & Security > How to make it work? Open Source Infrastructure / Development & Security > How to make it work?
Open Source Infrastructure / Development & Security > How to make it work?
Kangaroot1.2K views
KubeCon 2019 Recap (Parts 1-3) by Ford Prior
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)
Ford Prior84 views
OpenTelemetry 101 FTW by NGINX, Inc.
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
NGINX, Inc.61 views
Eric Loyd - Fractal Nagios by Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
Nagios683 views
Monitoring the Hashistack with Prometheus by Grafana Labs
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
Grafana Labs1.4K views
Swimming upstream: OPNFV Doctor project case study by OPNFV
Swimming upstream: OPNFV Doctor project case studySwimming upstream: OPNFV Doctor project case study
Swimming upstream: OPNFV Doctor project case study
OPNFV474 views
Consul administration at scale by Pierre Souchay
Consul administration at scaleConsul administration at scale
Consul administration at scale
Pierre Souchay454 views
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf by NETWAYS
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
NETWAYS126 views
Setting up your multiengine environment Apache Railo ColdFusion by ColdFusionConference
Setting up your multiengine environment Apache Railo ColdFusionSetting up your multiengine environment Apache Railo ColdFusion
Setting up your multiengine environment Apache Railo ColdFusion
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012 by TEST Huddle
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
TEST Huddle388 views
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays by Demi Ben-Ari
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Demi Ben-Ari181 views

More from Julien Pivotto

Prometheus: What is is, what is new, what is coming by
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingJulien Pivotto
42 views27 slides
What's new in Prometheus? by
What's new in Prometheus?What's new in Prometheus?
What's new in Prometheus?Julien Pivotto
15 views10 slides
Introduction to Grafana Loki by
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana LokiJulien Pivotto
190 views11 slides
Why you should revisit mgmt by
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmtJulien Pivotto
10 views46 slides
5 tips for Prometheus Service Discovery by
5 tips for Prometheus Service Discovery5 tips for Prometheus Service Discovery
5 tips for Prometheus Service DiscoveryJulien Pivotto
38 views11 slides
Prometheus and TLS - an Introduction by
Prometheus and TLS - an IntroductionPrometheus and TLS - an Introduction
Prometheus and TLS - an IntroductionJulien Pivotto
15 views14 slides

More from Julien Pivotto(20)

Prometheus: What is is, what is new, what is coming by Julien Pivotto
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
Julien Pivotto42 views
Introduction to Grafana Loki by Julien Pivotto
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
Julien Pivotto190 views
5 tips for Prometheus Service Discovery by Julien Pivotto
5 tips for Prometheus Service Discovery5 tips for Prometheus Service Discovery
5 tips for Prometheus Service Discovery
Julien Pivotto38 views
Prometheus and TLS - an Introduction by Julien Pivotto
Prometheus and TLS - an IntroductionPrometheus and TLS - an Introduction
Prometheus and TLS - an Introduction
Julien Pivotto15 views
HAProxy as Egress Controller by Julien Pivotto
HAProxy as Egress ControllerHAProxy as Egress Controller
HAProxy as Egress Controller
Julien Pivotto2.9K views
Improved alerting with Prometheus and Alertmanager by Julien Pivotto
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
Julien Pivotto4.5K views
SIngle Sign On with Keycloak by Julien Pivotto
SIngle Sign On with KeycloakSIngle Sign On with Keycloak
SIngle Sign On with Keycloak
Julien Pivotto10K views
Monitoring as an entry point for collaboration by Julien Pivotto
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
Julien Pivotto1.3K views
Monitor your CentOS stack with Prometheus by Julien Pivotto
Monitor your CentOS stack with PrometheusMonitor your CentOS stack with Prometheus
Monitor your CentOS stack with Prometheus
Julien Pivotto712 views
Monitor your CentOS stack with Prometheus by Julien Pivotto
Monitor your CentOS stack with PrometheusMonitor your CentOS stack with Prometheus
Monitor your CentOS stack with Prometheus
Julien Pivotto704 views
Cfgmgmt Challenges aren't technical anymore by Julien Pivotto
Cfgmgmt Challenges aren't technical anymoreCfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymore
Julien Pivotto354 views
Prometheus: From technical metrics to business observability by Julien Pivotto
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
Julien Pivotto4.4K views
Taking advantage of Prometheus relabeling by Julien Pivotto
Taking advantage of Prometheus relabelingTaking advantage of Prometheus relabeling
Taking advantage of Prometheus relabeling
Julien Pivotto21.5K views

Recently uploaded

Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
21 views68 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
16 views3 slides
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
52 views8 slides
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
12 views15 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
263 views86 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides

Recently uploaded(20)

Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc10 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi127 views

Observing the HashiCorp Ecosystem From Prometheus