SlideShare a Scribd company logo
CONTINUOUS DELIVERY.
CONTINUOUS DEVOPS.
Professional conference on DevOps practices
6APRIL 2019
KYIV,
th
6APRIL 2019 KYIV,
Björn “Beorn” Rabenstein
Applied Alerting Philosophy
th
My Philosophy on Alerting
based my observations while I was a Site Reliability Engineer at Google
Author: Rob Ewaschuk <rob@infinitepigeons.org>
Introduction
Vernacular
Monitor for your users
Cause-based alerts are bad (but sometimes necessary)
Alerting from the spout (or beyond!)
Causes are still useful
Tickets, Reports and Email
Playbooks
Tracking & Accountability
You're being naïve!
Summary
Summary
When you are auditing or writing alerting rules, consider these things to keep your oncall rotation happier:
goo.gl/2vrpSO
“What” versus “why” is one of the most important
distinctions in writing good monitoring with maximum
signal and minimum noise.
Chapter 6: Monitoring Distributed Systems
Symptoms vs. causes
Source: Betsy Beyer et al. “Site Reliability Engineering – How Google Runs Production Systems”
Expected response SRE book SoundCloud lingo Delivered to Based on
Act immediately Alerts Pages
severity="critical"
Pager Symptoms
Act eventually Tickets Tickets / “email alerts”
severity="warning"
Issue tracker /
Chat / email :-(
Symptoms or
causes
None (for diagnostics
only)
Logs Informational alerts
severity="info"
Nowhere /
dashboards
Causes
“Alerts” according to Prometheus:
Pages vs. tickets
“One person’s symptom is another person’s cause.”
“Not-yet-occurring but imminent problems.”
“Zero-redundancy (N + 0) situations count as imminent,
as do ‘nearly full’ parts of your service!”
What also counts as “symptoms”…
white-box
(needs instrumentation)
black-box
(no changes required)
host-based
“traditional”
service-based
“modern”
Black-box
monitoring
FTW?!?
Prometheus
Blackbox
Exporter
Black-box vs. white-box
Probing with real user traffic in multi-tiered services
Frontend service
(instrumented)
Backend service
A
Backend service
B
User
traffic
Measures
A’s and B’s
latency,
rps,
errors…
Alerts owners
of A or B
Alerting on SLOs at
https://developers.soundcloud.com/blog/how-soundclo
ud-uses-haproxy-with-kubernetes-for-user-facing-traffic
AMPELMANN GmbH CC BY-SA 4.0
- record: backend:http_errors_per_response:ratio_rate5m
expr: |2
sum by (backend)(rate(
haproxy_backend_http_responses_total{job="ampelmann", code="5xx"}[5m]
))
/
sum by (backend)(rate(
haproxy_backend_http_responses_total{job="ampelmann"}[5m]
))
- record: backend:error_slo:percent
labels:
backend: "api-v4"
expr: 0.1
- record: backend:error_slo:percent
labels:
backend: "api-v3"
expr: 0.2
# ... Many more backends.
SLO budget consumption Time window Burn rate Notification
2% 1 hour 14.4 Page
5% 6 hours 6 Page
10% 3 days 1 Ticket
expr: (
job:slo_errors_per_request:ratio_rate1h{job="myjob"} > (14.4*0.001)
or
job:slo_errors_per_request:ratio_rate6h{job="myjob"} > (6*0.001)
)
severity: page
expr: job:slo_errors_per_request:ratio_rate3d{job="myjob"} > 0.001
severity: ticket
- alert: AmpelmannErrorBudgetBurn
expr: |2
(
100 * backend:http_errors_per_response:ratio_rate1h
> on (backend)
14.4 * backend:error_slo:percent
)
and
(
100 * backend:http_errors_per_response:ratio_rate5m
> on (backend)
14.4 * backend:error_slo:percent
)
for: 2m
labels:
system: "{{$labels.backend}}"
severity: "critical"
window: "1h"
annotations:
summary: "a backend burns its error budget very fast"
description: "Backend {{$labels.backend}} has returned {{ $value | printf `%.2f` }}% 5xx
runbook: "http://runbooks.soundcloud.com/runbooks/ampelmann/#ampelmannerrorbudgetburn"
Alert Long window Short window for duration Burn rate
factor
Error budget
consumed
Page 1h 5m 2m 14.4 2%
Page 6h 30m 15m 6 5%
Ticket 1d 2h 1h 3 10%
Ticket 3d 6h 1h 1 10%
Who gets the page?
- alert: AmpelmannErrorBudgetBurn
expr: |2
(
100 * backend:http_errors_per_response:ratio_rate1h
> on (backend)
14.4 * backend:error_slo:percent
)
and
(
100 * backend:http_errors_per_response:ratio_rate5m
> on (backend)
14.4 * backend:error_slo:percent
)
for: 2m
labels:
system: "{{$labels.backend}}"
severity: "critical"
window: "1h"
annotations:
summary: "a backend burns its error budget very fast"
description: "Backend {{$labels.backend}} has returned {{ $value | printf `%.2f` }}% 5xx
runbook: "http://runbooks.soundcloud.com/runbooks/ampelmann/#ampelmannerrorbudgetburn"
route:
receiver: prodeng-warn
group_by:
- alertname
- zone
- system
routes:
- receiver: api-team-warn
match:
system: api-v4
routes:
- receiver: api-team-crit
match:
severity: critical
group_wait: 20s
group_interval: 5m
repeat_interval: 3h
- receiver: api-team-info
match:
severity: info
https://prometheus.io
https://github.com/beorn7/talks
https://developers.soundcloud.com/blog

More Related Content

Similar to DevOps Fest 2019. Björn Rabenstein. Applied Alerting Philosophy

Modern Web Security, Lazy but Mindful Like a Fox
Modern Web Security, Lazy but Mindful Like a FoxModern Web Security, Lazy but Mindful Like a Fox
Modern Web Security, Lazy but Mindful Like a Fox
C4Media
 
Web Ex2 28 Jan09
Web Ex2 28 Jan09Web Ex2 28 Jan09
Web Ex2 28 Jan09
Lawrence Bernstein
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
Vadym Kazulkin
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production Readiness
Amazon Web Services
 
Dependency check
Dependency checkDependency check
Dependency check
David Karlsen
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
Vadym Kazulkin
 
Tastypie: Easy APIs to Make Your Work Easier
Tastypie: Easy APIs to Make Your Work EasierTastypie: Easy APIs to Make Your Work Easier
Tastypie: Easy APIs to Make Your Work Easier
Harvard Web Working Group
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
Vadym Kazulkin
 
Product! - The road to production deployment
Product! - The road to production deploymentProduct! - The road to production deployment
Product! - The road to production deployment
Filippo Zanella
 
Google App Engine for Java
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for Java
Lars Vogel
 
App checker
App checkerApp checker
App checker
Startupvillage2015
 
CSG 2012
CSG 2012CSG 2012
CSG 2012
Scotty Logan
 
Continuous integration php
Continuous integration phpContinuous integration php
Continuous integration php
Lai Hieu
 
Php day 20 11 e xo continuousintegration php
Php day 20 11 e xo continuousintegration phpPhp day 20 11 e xo continuousintegration php
Php day 20 11 e xo continuousintegration phpQuang Anh Le
 
Operations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningOperations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from Happening
Amazon Web Services
 
WoMakersCode 2016 - Shit Happens
WoMakersCode 2016 -  Shit HappensWoMakersCode 2016 -  Shit Happens
WoMakersCode 2016 - Shit Happens
Jackson F. de A. Mafra
 
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Amazon Web Services
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
Chris Bailey
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
Dynatrace
 
Blue Teamin' on a Budget [of zero]
Blue Teamin' on a Budget [of zero]Blue Teamin' on a Budget [of zero]
Blue Teamin' on a Budget [of zero]
Kyle Bubp
 

Similar to DevOps Fest 2019. Björn Rabenstein. Applied Alerting Philosophy (20)

Modern Web Security, Lazy but Mindful Like a Fox
Modern Web Security, Lazy but Mindful Like a FoxModern Web Security, Lazy but Mindful Like a Fox
Modern Web Security, Lazy but Mindful Like a Fox
 
Web Ex2 28 Jan09
Web Ex2 28 Jan09Web Ex2 28 Jan09
Web Ex2 28 Jan09
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production Readiness
 
Dependency check
Dependency checkDependency check
Dependency check
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
 
Tastypie: Easy APIs to Make Your Work Easier
Tastypie: Easy APIs to Make Your Work EasierTastypie: Easy APIs to Make Your Work Easier
Tastypie: Easy APIs to Make Your Work Easier
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
 
Product! - The road to production deployment
Product! - The road to production deploymentProduct! - The road to production deployment
Product! - The road to production deployment
 
Google App Engine for Java
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for Java
 
App checker
App checkerApp checker
App checker
 
CSG 2012
CSG 2012CSG 2012
CSG 2012
 
Continuous integration php
Continuous integration phpContinuous integration php
Continuous integration php
 
Php day 20 11 e xo continuousintegration php
Php day 20 11 e xo continuousintegration phpPhp day 20 11 e xo continuousintegration php
Php day 20 11 e xo continuousintegration php
 
Operations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningOperations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from Happening
 
WoMakersCode 2016 - Shit Happens
WoMakersCode 2016 -  Shit HappensWoMakersCode 2016 -  Shit Happens
WoMakersCode 2016 - Shit Happens
 
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
 
Blue Teamin' on a Budget [of zero]
Blue Teamin' on a Budget [of zero]Blue Teamin' on a Budget [of zero]
Blue Teamin' on a Budget [of zero]
 

More from DevOps_Fest

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
DevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CD
DevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CDDevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CD
DevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CD
DevOps_Fest
 
DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...
DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...
DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...
DevOps_Fest
 
DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...
DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...
DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...
DevOps_Fest
 
DevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and Challanges
DevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and ChallangesDevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and Challanges
DevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and Challanges
DevOps_Fest
 
DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...
DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...
DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...
DevOps_Fest
 
DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...
DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...
DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...
DevOps_Fest
 
DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...
DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...
DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...
DevOps_Fest
 
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps_Fest
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps_Fest
 
DevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в Kubernetes
DevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в KubernetesDevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в Kubernetes
DevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в Kubernetes
DevOps_Fest
 
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps_Fest
 
DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...
DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...
DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...
DevOps_Fest
 
DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...
DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...
DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...
DevOps_Fest
 
DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...
DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...
DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...
DevOps_Fest
 
DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...
DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...
DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...
DevOps_Fest
 
DevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOps
DevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOpsDevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOps
DevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOps
DevOps_Fest
 
DevOps Fest 2020. Philipp Krenn. Scale Your Auditing Events
DevOps Fest 2020. Philipp Krenn. Scale Your Auditing EventsDevOps Fest 2020. Philipp Krenn. Scale Your Auditing Events
DevOps Fest 2020. Philipp Krenn. Scale Your Auditing Events
DevOps_Fest
 
DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...
DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...
DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...
DevOps_Fest
 
DevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra Light
DevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra LightDevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra Light
DevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra Light
DevOps_Fest
 

More from DevOps_Fest (20)

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
DevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CD
DevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CDDevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CD
DevOps Fest 2020. Kohsuke Kawaguchi. GitOps, Jenkins X & the Future of CI/CD
 
DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...
DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...
DevOps Fest 2020. Барух Садогурский и Леонид Игольник. Устраиваем DevOps без ...
 
DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...
DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...
DevOps Fest 2020. James Spiteri. Advanced Security Operations with Elastic Se...
 
DevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and Challanges
DevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and ChallangesDevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and Challanges
DevOps Fest 2020. Pavlo Repalo. Edge Computing: Appliance and Challanges
 
DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...
DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...
DevOps Fest 2020. Максим Безуглый. DevOps - как архитектура в процессе. Две к...
 
DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...
DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...
DevOps Fest 2020. Павел Жданов та Никора Никита. Построение процесса CI\CD дл...
 
DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...
DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...
DevOps Fest 2020. Станислав Коленкин. How to connect non-connectible: tips, t...
 
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
 
DevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в Kubernetes
DevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в KubernetesDevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в Kubernetes
DevOps Fest 2020. Роман Орлов. Инфраструктура тестирования в Kubernetes
 
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
 
DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...
DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...
DevOps Fest 2020. Александр Глущенко. Modern Enterprise Network Architecture ...
 
DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...
DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...
DevOps Fest 2020. Виталий Складчиков. Сквозь монолитный enterprise к микросер...
 
DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...
DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...
DevOps Fest 2020. Денис Медведенко. Управление сложными многокомпонентными ин...
 
DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...
DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...
DevOps Fest 2020. Павел Галушко. Что делать devops'у если у вас захотели mach...
 
DevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOps
DevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOpsDevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOps
DevOps Fest 2020. Сергей Абаничев. Modern CI\CD pipeline with Azure DevOps
 
DevOps Fest 2020. Philipp Krenn. Scale Your Auditing Events
DevOps Fest 2020. Philipp Krenn. Scale Your Auditing EventsDevOps Fest 2020. Philipp Krenn. Scale Your Auditing Events
DevOps Fest 2020. Philipp Krenn. Scale Your Auditing Events
 
DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...
DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...
DevOps Fest 2020. Володимир Мельник. TuchaKube - перша українська DevOps/Host...
 
DevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra Light
DevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra LightDevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra Light
DevOps Fest 2020. Денис Васильев. Let's make it KUL! Kubernetes Ultra Light
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 

DevOps Fest 2019. Björn Rabenstein. Applied Alerting Philosophy

  • 1. CONTINUOUS DELIVERY. CONTINUOUS DEVOPS. Professional conference on DevOps practices 6APRIL 2019 KYIV, th
  • 2. 6APRIL 2019 KYIV, Björn “Beorn” Rabenstein Applied Alerting Philosophy th
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. My Philosophy on Alerting based my observations while I was a Site Reliability Engineer at Google Author: Rob Ewaschuk <rob@infinitepigeons.org> Introduction Vernacular Monitor for your users Cause-based alerts are bad (but sometimes necessary) Alerting from the spout (or beyond!) Causes are still useful Tickets, Reports and Email Playbooks Tracking & Accountability You're being naïve! Summary Summary When you are auditing or writing alerting rules, consider these things to keep your oncall rotation happier: goo.gl/2vrpSO
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. “What” versus “why” is one of the most important distinctions in writing good monitoring with maximum signal and minimum noise. Chapter 6: Monitoring Distributed Systems Symptoms vs. causes Source: Betsy Beyer et al. “Site Reliability Engineering – How Google Runs Production Systems”
  • 15. Expected response SRE book SoundCloud lingo Delivered to Based on Act immediately Alerts Pages severity="critical" Pager Symptoms Act eventually Tickets Tickets / “email alerts” severity="warning" Issue tracker / Chat / email :-( Symptoms or causes None (for diagnostics only) Logs Informational alerts severity="info" Nowhere / dashboards Causes “Alerts” according to Prometheus: Pages vs. tickets
  • 16. “One person’s symptom is another person’s cause.” “Not-yet-occurring but imminent problems.” “Zero-redundancy (N + 0) situations count as imminent, as do ‘nearly full’ parts of your service!” What also counts as “symptoms”…
  • 17. white-box (needs instrumentation) black-box (no changes required) host-based “traditional” service-based “modern” Black-box monitoring FTW?!? Prometheus Blackbox Exporter Black-box vs. white-box
  • 18. Probing with real user traffic in multi-tiered services Frontend service (instrumented) Backend service A Backend service B User traffic Measures A’s and B’s latency, rps, errors… Alerts owners of A or B
  • 21.
  • 22. - record: backend:http_errors_per_response:ratio_rate5m expr: |2 sum by (backend)(rate( haproxy_backend_http_responses_total{job="ampelmann", code="5xx"}[5m] )) / sum by (backend)(rate( haproxy_backend_http_responses_total{job="ampelmann"}[5m] ))
  • 23. - record: backend:error_slo:percent labels: backend: "api-v4" expr: 0.1 - record: backend:error_slo:percent labels: backend: "api-v3" expr: 0.2 # ... Many more backends.
  • 24.
  • 25.
  • 26. SLO budget consumption Time window Burn rate Notification 2% 1 hour 14.4 Page 5% 6 hours 6 Page 10% 3 days 1 Ticket
  • 27. expr: ( job:slo_errors_per_request:ratio_rate1h{job="myjob"} > (14.4*0.001) or job:slo_errors_per_request:ratio_rate6h{job="myjob"} > (6*0.001) ) severity: page expr: job:slo_errors_per_request:ratio_rate3d{job="myjob"} > 0.001 severity: ticket
  • 28.
  • 29.
  • 30. - alert: AmpelmannErrorBudgetBurn expr: |2 ( 100 * backend:http_errors_per_response:ratio_rate1h > on (backend) 14.4 * backend:error_slo:percent ) and ( 100 * backend:http_errors_per_response:ratio_rate5m > on (backend) 14.4 * backend:error_slo:percent ) for: 2m labels: system: "{{$labels.backend}}" severity: "critical" window: "1h" annotations: summary: "a backend burns its error budget very fast" description: "Backend {{$labels.backend}} has returned {{ $value | printf `%.2f` }}% 5xx runbook: "http://runbooks.soundcloud.com/runbooks/ampelmann/#ampelmannerrorbudgetburn"
  • 31. Alert Long window Short window for duration Burn rate factor Error budget consumed Page 1h 5m 2m 14.4 2% Page 6h 30m 15m 6 5% Ticket 1d 2h 1h 3 10% Ticket 3d 6h 1h 1 10%
  • 32. Who gets the page?
  • 33. - alert: AmpelmannErrorBudgetBurn expr: |2 ( 100 * backend:http_errors_per_response:ratio_rate1h > on (backend) 14.4 * backend:error_slo:percent ) and ( 100 * backend:http_errors_per_response:ratio_rate5m > on (backend) 14.4 * backend:error_slo:percent ) for: 2m labels: system: "{{$labels.backend}}" severity: "critical" window: "1h" annotations: summary: "a backend burns its error budget very fast" description: "Backend {{$labels.backend}} has returned {{ $value | printf `%.2f` }}% 5xx runbook: "http://runbooks.soundcloud.com/runbooks/ampelmann/#ampelmannerrorbudgetburn"
  • 34. route: receiver: prodeng-warn group_by: - alertname - zone - system routes: - receiver: api-team-warn match: system: api-v4 routes: - receiver: api-team-crit match: severity: critical group_wait: 20s group_interval: 5m repeat_interval: 3h - receiver: api-team-info match: severity: info