SlideShare a Scribd company logo
1 of 29
Download to read offline
Monitoring a Kubernetes-backed microservice
architecture with Prometheus
Björn “Beorn” Rabenstein — SoundCloud
Fabian Reinartz — CoreOS
SoundCloud 2012 – from monolith …
… to microservices
Orchestration needed
Run containers in a cluster…
In-house innovation: Bazooka – PaaS, Heroko style.
Problems:
- Only 12-factor apps (stateless etc.).
- Limited resource isolation.
- No sidecars.
- Maturity.
Meanwhile, the open-source world has evolved…
K U B E R N E T E S
Kubernetes
- inspired by Google’s Borg
- not Borg
Today:
Tomorrow:
X
Labels Labels are the
new hierarchies!
name = MyAPI
app = MyAPI
version = 0.4
app = MyAPI
version = 0.5
app = MyAPI
version = 0.4
app = MyDB
cluster = auth
app = MyDB
cluster = misc
select by [ app = MyAPI ]
service
pod
pod
pod
pod
pod
Graphite
Monitoring at SC 2012
Service A
Service B
Service C
Monitoring challenges
‐ A lot of traffic to monitor
- Monitoring traffic should not be proportional to user traffic
‐ Way more targets to monitor
- One host can run many containers
‐ And they constantly change
- Deploys, scaling, rescheduling unhealthy instances …
‐ Need a fleet-wide view.
- What’s my overall 99th percentile latency?
‐ Still need to be able to drill down for troubleshooting
- Which instance/endpoint/version/... causes those errors I’m seeing?
‐ Meaningful alerting
- Symptom-based alerting for pages, cause-based alerting for warnings
- See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
Monitor everything, all levels, with the same system
Level What to monitor (examples) What exposes metrics (example)
Network Routers, switches SNMP exporter
Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter
Container Resource usage, performance characteristics cAdvisor
Application Latency, errors, QPS, internal state Your own code
Orchestration Cluster resources, scheduling Kubernetes components
P R O M E T H E U S
“Obviously the solution to all our problems with everything forever, right?”
Benjamin Staffin, Fitbit Site Operations
Prometheus
- inspired by Google’s Borgmon
- not Borgmon
- initially developed at SoundCloud, open-source from the beginning
- public announcement early 2015
- collects metrics at scale via HTTP (think: yet another client of your microservice)
- thousands of targets, millions of time series, 800k samples/s, no dependencies
- easy to scale
Features – multi-dimensional data model
http_requests_total{instance="web-1", path="/index", status="500", method="GET"}
http_requests_total{instance="web-1", path="/index", status="404", method="POST"}
http_requests_total{instance="web-3", path="/index", status="200", method="GET"}
#metrics x #values(instance) x #values(path) x #values(status) x #values(method)
▶ millions of time series
Labels are the
new hierarchies!
Features – powerful query language
The 3 path-method combinations with the highest number of failing requests?
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
The 99th percentile request latency by request path?
histogram_quantile(0.99, sum by(le, path) (
rate(http_requests_duration_seconds_bucket[5m])
))
The questions to ask are often not known beforehand.
Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
{path="/api/comments", method="POST"} 105.4
{path="/api/user/:id", method="GET"} 34.122
{path="/api/comment/:id/edit", method="POST"} 29.31
Features – easy instrumentation
from prometheus_client import start_http_server, Histogram
# Create a metric to track time spent and requests made.
REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
# do work …
return
start_http_server(8000)
Integrations (selection)
K U B E R N E T E S
P R O M E T H E U S
Three questions
How to monitor
services running
on Kubernetes
with Prometheus?
X
?
?
How to monitor
Kubernetes and
containers with
Prometheus?
How to run
Prometheus on
Kubernetes?
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
Monitoring Services via Exporters
svc = mysql
namespace = production
MySQL
exporter
MySQL
Monitoring Kubernetes
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
Running Prometheus on Kubernetes
- So far: Prometheus ran outside of cluster
- Pod IPs must be routable
- Conventional deployment (Chef, Puppet, …)
- Service discovery needs authentication
- To run Prometheus inside of cluster:
kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
app = prometheus
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
app = prometheus
namespace = infra
Monitoring Kubernetes
What about storage?
A) None
B) Network/Cloud volumes
C) Host volumes
DEMO
CC BY 2.0
Author: carstingaxion / Carsten Bach
The end

More Related Content

What's hot

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...Tokuhiro Matsuno
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Matthew Campbell
 
Service Discovery in Prometheus
Service Discovery in PrometheusService Discovery in Prometheus
Service Discovery in PrometheusOliver Moser
 
Monitoring infrastructure with prometheus
Monitoring infrastructure with prometheusMonitoring infrastructure with prometheus
Monitoring infrastructure with prometheusShahnawaz Saifi
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric carMarco Pas
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudTobias Schmidt
 
Kafka monitoring and metrics
Kafka monitoring and metricsKafka monitoring and metrics
Kafka monitoring and metricsTouraj Ebrahimi
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.Grafana Labs
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusMatthias Grüter
 
20171027 モニタリング勉強会
20171027 モニタリング勉強会20171027 モニタリング勉強会
20171027 モニタリング勉強会Paul Traylor
 

What's hot (20)

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)
 
Service Discovery in Prometheus
Service Discovery in PrometheusService Discovery in Prometheus
Service Discovery in Prometheus
 
Monitoring infrastructure with prometheus
Monitoring infrastructure with prometheusMonitoring infrastructure with prometheus
Monitoring infrastructure with prometheus
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric car
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloud
 
Kafka monitoring and metrics
Kafka monitoring and metricsKafka monitoring and metrics
Kafka monitoring and metrics
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to Prometheus
 
20171027 モニタリング勉強会
20171027 モニタリング勉強会20171027 モニタリング勉強会
20171027 モニタリング勉強会
 

Similar to Monitoring a Kubernetes-backed microservice architecture with Prometheus

Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011Opevel
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubAlfonso Martino
 
Deep Dive into SpaceONE
Deep Dive into SpaceONEDeep Dive into SpaceONE
Deep Dive into SpaceONEChoonho Son
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016Aad Versteden
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013Marc Gille
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
Plone FSR
Plone FSRPlone FSR
Plone FSRfulv
 
In cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices MeetupIn cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices MeetupNeil Gehani
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...OpenWhisk
 
Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Aad Versteden
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemAccumulo Summit
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Chris Bunch
 
Social Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkSocial Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkNico Meisenzahl
 
Apache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless ComputingApache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless ComputingUpkar Lidder
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 

Similar to Monitoring a Kubernetes-backed microservice architecture with Prometheus (20)

Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
 
Deep Dive into SpaceONE
Deep Dive into SpaceONEDeep Dive into SpaceONE
Deep Dive into SpaceONE
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Plone FSR
Plone FSRPlone FSR
Plone FSR
 
In cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices MeetupIn cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices Meetup
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
 
Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
 
Social Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkSocial Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections Pink
 
Apache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless ComputingApache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless Computing
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Monitoring a Kubernetes-backed microservice architecture with Prometheus

  • 1. Monitoring a Kubernetes-backed microservice architecture with Prometheus Björn “Beorn” Rabenstein — SoundCloud Fabian Reinartz — CoreOS
  • 2. SoundCloud 2012 – from monolith …
  • 4. Orchestration needed Run containers in a cluster… In-house innovation: Bazooka – PaaS, Heroko style. Problems: - Only 12-factor apps (stateless etc.). - Limited resource isolation. - No sidecars. - Maturity. Meanwhile, the open-source world has evolved…
  • 5. K U B E R N E T E S
  • 6. Kubernetes - inspired by Google’s Borg - not Borg Today: Tomorrow: X
  • 7. Labels Labels are the new hierarchies! name = MyAPI app = MyAPI version = 0.4 app = MyAPI version = 0.5 app = MyAPI version = 0.4 app = MyDB cluster = auth app = MyDB cluster = misc select by [ app = MyAPI ] service pod pod pod pod pod
  • 8. Graphite Monitoring at SC 2012 Service A Service B Service C
  • 9. Monitoring challenges ‐ A lot of traffic to monitor - Monitoring traffic should not be proportional to user traffic ‐ Way more targets to monitor - One host can run many containers ‐ And they constantly change - Deploys, scaling, rescheduling unhealthy instances … ‐ Need a fleet-wide view. - What’s my overall 99th percentile latency? ‐ Still need to be able to drill down for troubleshooting - Which instance/endpoint/version/... causes those errors I’m seeing? ‐ Meaningful alerting - Symptom-based alerting for pages, cause-based alerting for warnings - See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
  • 10. Monitor everything, all levels, with the same system Level What to monitor (examples) What exposes metrics (example) Network Routers, switches SNMP exporter Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter Container Resource usage, performance characteristics cAdvisor Application Latency, errors, QPS, internal state Your own code Orchestration Cluster resources, scheduling Kubernetes components
  • 11. P R O M E T H E U S “Obviously the solution to all our problems with everything forever, right?” Benjamin Staffin, Fitbit Site Operations
  • 12. Prometheus - inspired by Google’s Borgmon - not Borgmon - initially developed at SoundCloud, open-source from the beginning - public announcement early 2015 - collects metrics at scale via HTTP (think: yet another client of your microservice) - thousands of targets, millions of time series, 800k samples/s, no dependencies - easy to scale
  • 13. Features – multi-dimensional data model http_requests_total{instance="web-1", path="/index", status="500", method="GET"} http_requests_total{instance="web-1", path="/index", status="404", method="POST"} http_requests_total{instance="web-3", path="/index", status="200", method="GET"} #metrics x #values(instance) x #values(path) x #values(status) x #values(method) ▶ millions of time series Labels are the new hierarchies!
  • 14. Features – powerful query language The 3 path-method combinations with the highest number of failing requests? topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) The 99th percentile request latency by request path? histogram_quantile(0.99, sum by(le, path) ( rate(http_requests_duration_seconds_bucket[5m]) )) The questions to ask are often not known beforehand.
  • 15. Features – powerful query language topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) {path="/api/comments", method="POST"} 105.4 {path="/api/user/:id", method="GET"} 34.122 {path="/api/comment/:id/edit", method="POST"} 29.31
  • 16. Features – easy instrumentation from prometheus_client import start_http_server, Histogram # Create a metric to track time spent and requests made. REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request') # Decorate function with metric. @REQUEST_TIME.time() def process_request(t): # do work … return start_http_server(8000)
  • 18. K U B E R N E T E S P R O M E T H E U S
  • 19.
  • 20. Three questions How to monitor services running on Kubernetes with Prometheus? X ? ? How to monitor Kubernetes and containers with Prometheus? How to run Prometheus on Kubernetes?
  • 21. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5
  • 22. Monitoring Services via Exporters svc = mysql namespace = production MySQL exporter MySQL
  • 23. Monitoring Kubernetes namespace = kube-system apiserver kubelet kubelet etcd kubelet kubelet kubelet kubelet
  • 24. Running Prometheus on Kubernetes - So far: Prometheus ran outside of cluster - Pod IPs must be routable - Conventional deployment (Chef, Puppet, …) - Service discovery needs authentication - To run Prometheus inside of cluster: kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
  • 25. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5 app = prometheus
  • 27. What about storage? A) None B) Network/Cloud volumes C) Host volumes
  • 28. DEMO CC BY 2.0 Author: carstingaxion / Carsten Bach