SlideShare a Scribd company logo
1 of 86
Download to read offline
End-to-end Monitoring
with the Prometheus Operator
By @mxinden
Max Inden
Test-Engineer at CoreOS
@mxinden
Max.Inden@CoreOS.com
Secure, simplify and automate container
infrastructure
Secure, simplify and automate container
infrastructure
Secure, simplify and automate container
infrastructure
Secure, simplify and automate container
infrastructure
Why Monitoring?
Why Monitoring?
Alerting
Why Monitoring?
Long-term trendsAlerting
What is Prometheus?
● Open Source Monitoring
● Built by Soundcloud
● Inspired by borgmon
●
What is Prometheus?
● Pull-based
●
What is Prometheus?
● Pull-based
● Multi-Dimensional
●
What is Prometheus?
● Pull-based
● Multi-Dimensional
● Metrics, not logging, not tracing
●
What is Prometheus?
● Pull-based
● Multi-Dimensional
● Metrics, not logging, not tracing
● No magic!
●
Target
Target
Target
Target /metrics
Target /metrics
Target /metrics
Prometheus
Target /metrics
Target /metrics
Target /metrics
Prometheus
Target /metrics
Target /metrics
Target /metrics
15s
Target /metrics
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8
Target /metrics
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8
Metric name
Target /metrics
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8
Label
Target /metrics
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8
Value
Prometheus
Target /metrics
Target /metrics
Target /metrics
Prometheus
Target /metrics
Target /metrics
Target /metrics
PromQL
Current percentage of HTTP errors across all service instances?
Current percentage of HTTP errors across all service instances?
sum by(path) rate(http_requests_total{status="500"}[5m]))
/ sum by(path) rate(http_requests_total[5m]))
Current percentage of HTTP errors across all service instances?
{path="/status"} 0.0039
{path="/"} 0.0011
{path="/api/v1/topics/:topic"} 0.087
{path="/api/v1/topics} 0.0342
sum by(path) rate(http_requests_total{status="500"}[5m]))
/ sum by(path) rate(http_requests_total[5m]))
Prometheus
Target /metrics
Target /metrics
Target /metrics
PromQL
Prometheus
Target /metrics
Target /metrics
Target /metrics
PromQL
Web UI Dashboard
Prometheus
Target /metrics
Target /metrics
Target /metrics
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alert Definition
ALERT DiskWillFillIn4Hours
IF predict_linear(node_filesystem_free[1h], 4*3600) < 0
Is any disk about to run full within 4 hours?
0
now-1h +4h
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alert Definition
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alert Definition
1m
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alert Definition
1m
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alert Definition
Alertmanager
1m
Alertmanager
Deduplicates
Alert
Alert
Alert
Alert
Alert
Alert
Alert
Alertmanager
Deduplicates
Alert
Alert
Alert
Alert
Alert
Alert
Alert
Groups
Alert
Alert
Alert
Alert
Alert
Alert
Alertmanager
Deduplicates
Alert
Alert
Alert
Alert
Alert
Alert
Alert
Groups
Alert
Alert
Alert
Alert
Alert
Alert
Routes
Alert
Alert
Alert
Alert
Alert
Team A
Team B
Team C
Alertmanager
Deduplicates
Alert
Alert
Alert
Alert
Alert
Alert
Alert
Groups
Alert
Alert
Alert
Alert
Alert
Alert
Routes
Alert
Alert
Alert
Alert
Alert
Team A
Team B
Team C
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alertmanager
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alertmanager
Monitoring
Application Cluster
Monitoring
Cluster Monitoring
What is Kubernetes?
Platform for running
containerized applications
What is Kubernetes?
Announced 2014 by Google
Influenced by Borg & Omega
v1.01 in July 2015
Kubernetes joins the CNCF
Master
Master
API-Server
etcd
Controller-Manager
Scheduler
Kube-DNS
...
Master
API-Server
etcd
Controller-Manager
Scheduler
Kube-DNS
...
Worker
Master
API-Server
etcd
Controller-Manager
Scheduler
Kube-DNS
...
Worker
Kubelet
Kube-Proxy
...
Application Monitoring
Location
User
AppX
Location
User
AppX
User
AppX
Location
Location
User
AppX
User
AppX
Location
Service
Service
Service
Location
User
AppX
User
AppX
Location
Service
Service
Service
Prometheus
Location
User
AppX
User
AppX
Location
Service
Service
Service
Prometheus
?
K8s-API-Server
Location
User
AppX
User
AppX
Location
Service
Service
Service
Prometheus
Location
User
AppX
User
AppX
Location
Service
Service
Service
Prometheus
K8s-API-Server
Service Discovery
● Static target list
● DNS discovery
● Kubernetes discovery
● ...
Master
API-Server
etcd
Controller-Manager
Scheduler
Kube-DNS
...
Worker
Kubelet
Kube-Proxy
...
Location
User
AppX
User
AppX
Location
Service
Service
Service
Prometheus
K8s-API-Server
Application-MonitoringCluster-Monitoring
Problem
Prometheus is stateful and difficult to
configure!
Introducing the
Prometheus Operator
What is a K8s Operator?
What is a K8s Operator?
Application specific
operational knowledge
What is a K8s Operator?
What is a K8s Operator?
</>
What is a K8s Operator?
</>
What is a K8s Operator?
</>
Operator
Prometheus Operator
● Kubernetes native configuration
● Automated management and upgrades
of Prometheus & Alertmanager
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-app
spec:
...
apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
name: prometheus-k8s
spec:
...
Kube-Prometheus
Single command to install:
● Prometheus & Alertmanager Cluster
● Alerting rules
● Dashboarding
Demo
Recap
What is Prometheus?
● Pull-based
● Multi-Dimensional
● Metrics, not logging, not tracing
● No magic!
●
Prometheus
Target /metrics
Target /metrics
Target /metrics
15s
Prometheus
Target /metrics
Target /metrics
Target /metrics
Alert Definition
Alertmanager
1m
Prometheus-Operator & Kube-Prometheus
</>
Operator
Where to go from here?
Prometheus.io
/coreos/prometheus-operator
San Francisco, New York & Berlin
We are hiring!
Max Inden
Test-Engineer at CoreOS
@mxinden
Max.Inden@CoreOS.com

More Related Content

What's hot

What's hot (20)

Monitoring microservices with Prometheus
Monitoring microservices with PrometheusMonitoring microservices with Prometheus
Monitoring microservices with Prometheus
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Prometheus
PrometheusPrometheus
Prometheus
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 

Similar to End to-end monitoring with the prometheus operator - Max Inden

Similar to End to-end monitoring with the prometheus operator - Max Inden (20)

Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Penetration testing dont just leave it to chance
Penetration testing dont just leave it to chancePenetration testing dont just leave it to chance
Penetration testing dont just leave it to chance
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBM
 
Extra micrometer practices with Quarkus | DevNation Tech Talk
Extra micrometer practices with Quarkus | DevNation Tech TalkExtra micrometer practices with Quarkus | DevNation Tech Talk
Extra micrometer practices with Quarkus | DevNation Tech Talk
 
Hot sos em12c_metric_extensions
Hot sos em12c_metric_extensionsHot sos em12c_metric_extensions
Hot sos em12c_metric_extensions
 
Prometheus for the traditional datacenter
Prometheus for the traditional datacenterPrometheus for the traditional datacenter
Prometheus for the traditional datacenter
 
Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Princeton-NJ-Meetup-Troubleshooting-with-AnyPoint-Monitoring
Princeton-NJ-Meetup-Troubleshooting-with-AnyPoint-MonitoringPrinceton-NJ-Meetup-Troubleshooting-with-AnyPoint-Monitoring
Princeton-NJ-Meetup-Troubleshooting-with-AnyPoint-Monitoring
 
Seacon Continuous Delivery Pipeline Tools Track
Seacon Continuous Delivery Pipeline Tools TrackSeacon Continuous Delivery Pipeline Tools Track
Seacon Continuous Delivery Pipeline Tools Track
 
ElasTest: quality for cloud native applications
ElasTest: quality for cloud native applicationsElasTest: quality for cloud native applications
ElasTest: quality for cloud native applications
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Machine Learning to Turbo-Charge the Ops Portion of DevOps
Machine Learning to Turbo-Charge the Ops Portion of DevOpsMachine Learning to Turbo-Charge the Ops Portion of DevOps
Machine Learning to Turbo-Charge the Ops Portion of DevOps
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 

More from Paris Container Day

More from Paris Container Day (10)

Living the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonLiving the Nomadic life - Nic Jackson
Living the Nomadic life - Nic Jackson
 
There is no container - Ori Pekelman
There is no container - Ori PekelmanThere is no container - Ori Pekelman
There is no container - Ori Pekelman
 
Advanced Task Scheduling with Amazon ECS - Julien Simon
Advanced Task Scheduling with Amazon ECS - Julien SimonAdvanced Task Scheduling with Amazon ECS - Julien Simon
Advanced Task Scheduling with Amazon ECS - Julien Simon
 
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago ScolarProduction FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
 
Security in a containerized world - Jessie Frazelle
Security in a containerized world - Jessie FrazelleSecurity in a containerized world - Jessie Frazelle
Security in a containerized world - Jessie Frazelle
 
Monitoring de conteneurs en production - Jonathan Raffre & Jean-Pascal Thiery
Monitoring de conteneurs en production - Jonathan Raffre & Jean-Pascal ThieryMonitoring de conteneurs en production - Jonathan Raffre & Jean-Pascal Thiery
Monitoring de conteneurs en production - Jonathan Raffre & Jean-Pascal Thiery
 
Nomad, l'orchestration made in Hashicorp - Bastien Cadiot
Nomad, l'orchestration made in Hashicorp - Bastien CadiotNomad, l'orchestration made in Hashicorp - Bastien Cadiot
Nomad, l'orchestration made in Hashicorp - Bastien Cadiot
 
OpenShift en production - Akram Ben Assi & Eloïse Faure
OpenShift en production - Akram Ben Assi & Eloïse FaureOpenShift en production - Akram Ben Assi & Eloïse Faure
OpenShift en production - Akram Ben Assi & Eloïse Faure
 
Using containers for continuous integration and continuous delivery - Carlos ...
Using containers for continuous integration and continuous delivery - Carlos ...Using containers for continuous integration and continuous delivery - Carlos ...
Using containers for continuous integration and continuous delivery - Carlos ...
 
Paris container day june17
Paris container day   june17Paris container day   june17
Paris container day june17
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 

End to-end monitoring with the prometheus operator - Max Inden