SlideShare a Scribd company logo

Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)

Counters are one of the two core metric types in Prometheus, allowing for tracking of request rates, error ratios and other key measurements. Learn why are they designed the way they are, how client libraries implement them and how rate() works. If you'd like more information about Prometheus, contact us at prometheus@robustperception.io

1 of 43
Download to read offline
Brian Brazil
Founder
Counting with Prometheus
Who am I?
Engineer passionate about running software reliably in production.
Core developer of Prometheus
Studied Computer Science in Trinity College Dublin.
Google SRE for 7 years, working on high-scale reliable systems.
Contributor to many open source projects, including Prometheus, Ansible,
Python, Aurora and Zookeeper.
Founder of Robust Perception, provider of commercial support and consulting
What is Prometheus?
Prometheus is a metrics-based time series database, designed for whitebox
monitoring.
It supports labels (dimensions/tags).
Alerting and graphing are unified, using the same language.
Architecture
Counting
Counting is easy right?
Just 1, 2, 3?
Counting… to the Extreme!!!
What if we're counting monitoring-related events though for a metrics system.
We're usually sampling data over the network => potential data loss.
What happens with the data we transfer at the other end?
Ad

Recommended

Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialTim Vaillancourt
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDA GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDJulian Mazzitelli
 

More Related Content

What's hot

Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemFabian Reinartz
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
Real Time Test Data with Grafana
Real Time Test Data with GrafanaReal Time Test Data with Grafana
Real Time Test Data with GrafanaIoannis Papadakis
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheusBrice Fernandes
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basicsJuraj Hantak
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheusKasper Nissen
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to PrometheusJulien Pivotto
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBleesjensen
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 

What's hot (20)

Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring System
 
Effective terraform
Effective terraformEffective terraform
Effective terraform
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Real Time Test Data with Grafana
Real Time Test Data with GrafanaReal Time Test Data with Grafana
Real Time Test Data with Grafana
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDB
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 

Similar to Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Algorithm Analysis.pdf
Algorithm Analysis.pdfAlgorithm Analysis.pdf
Algorithm Analysis.pdfMemMem25
 
Stevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting AlgorithmsStevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting AlgorithmsJames Stevens
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to AlgorithmsVenkatesh Iyer
 
3.1 AnalysisUse the copy-and-paste method to give an average.docx
3.1 AnalysisUse the copy-and-paste method to give an average.docx3.1 AnalysisUse the copy-and-paste method to give an average.docx
3.1 AnalysisUse the copy-and-paste method to give an average.docxrhetttrevannion
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Brian Brazil
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMSTanya Makkar
 
Six Sigma Dfss Application In Data Accarucy
Six Sigma Dfss Application In Data AccarucySix Sigma Dfss Application In Data Accarucy
Six Sigma Dfss Application In Data Accarucyxyhfun
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
 
Process Synchronization -1.ppt
Process Synchronization -1.pptProcess Synchronization -1.ppt
Process Synchronization -1.pptjayverma27
 
Ch7 OS
Ch7 OSCh7 OS
Ch7 OSC.U
 

Similar to Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017) (20)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Algorithm Analysis.pdf
Algorithm Analysis.pdfAlgorithm Analysis.pdf
Algorithm Analysis.pdf
 
Stevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting AlgorithmsStevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting Algorithms
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
3.1 AnalysisUse the copy-and-paste method to give an average.docx
3.1 AnalysisUse the copy-and-paste method to give an average.docx3.1 AnalysisUse the copy-and-paste method to give an average.docx
3.1 AnalysisUse the copy-and-paste method to give an average.docx
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Algo and flowchart
Algo and flowchartAlgo and flowchart
Algo and flowchart
 
Analyzing algorithms
Analyzing algorithmsAnalyzing algorithms
Analyzing algorithms
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
 
Six Sigma Dfss Application In Data Accarucy
Six Sigma Dfss Application In Data AccarucySix Sigma Dfss Application In Data Accarucy
Six Sigma Dfss Application In Data Accarucy
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Process Synchronization -1.ppt
Process Synchronization -1.pptProcess Synchronization -1.ppt
Process Synchronization -1.ppt
 
Major ppt
Major pptMajor ppt
Major ppt
 
Ch7 OS
Ch7 OSCh7 OS
Ch7 OS
 

More from Brian Brazil

OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)Brian Brazil
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
 
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Brian Brazil
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Brian Brazil
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Brian Brazil
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Brian Brazil
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)Brian Brazil
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Brian Brazil
 
Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Brian Brazil
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Brian Brazil
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an ExporterBrian Brazil
 
An Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLAn Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLBrian Brazil
 
Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Brian Brazil
 
Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Brian Brazil
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Brian Brazil
 

More from Brian Brazil (20)

OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
 
Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
An Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLAn Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQL
 
Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)
 
Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)
 

Recently uploaded

Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionalsthirdeyegen65
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfgalfinprihardiputra0
 
history of tau gamma architect.1968.....
history of tau gamma architect.1968.....history of tau gamma architect.1968.....
history of tau gamma architect.1968.....josephiigo
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Damar Juniarto
 
Modern Red Teaming - subverting mature defenses on a budget
Modern Red Teaming - subverting mature defenses on a budgetModern Red Teaming - subverting mature defenses on a budget
Modern Red Teaming - subverting mature defenses on a budgetmatt806068
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspacesttyk
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...ssuser7b7f4e
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxRitesh Sahu
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter TuningVarun Garg
 
Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defensethirdeyegen65
 

Recently uploaded (10)

Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionals
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdf
 
history of tau gamma architect.1968.....
history of tau gamma architect.1968.....history of tau gamma architect.1968.....
history of tau gamma architect.1968.....
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023
 
Modern Red Teaming - subverting mature defenses on a budget
Modern Red Teaming - subverting mature defenses on a budgetModern Red Teaming - subverting mature defenses on a budget
Modern Red Teaming - subverting mature defenses on a budget
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspace
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
 
Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defense
 

Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)

  • 2. Who am I? Engineer passionate about running software reliably in production. Core developer of Prometheus Studied Computer Science in Trinity College Dublin. Google SRE for 7 years, working on high-scale reliable systems. Contributor to many open source projects, including Prometheus, Ansible, Python, Aurora and Zookeeper. Founder of Robust Perception, provider of commercial support and consulting
  • 3. What is Prometheus? Prometheus is a metrics-based time series database, designed for whitebox monitoring. It supports labels (dimensions/tags). Alerting and graphing are unified, using the same language.
  • 5. Counting Counting is easy right? Just 1, 2, 3?
  • 6. Counting… to the Extreme!!! What if we're counting monitoring-related events though for a metrics system. We're usually sampling data over the network => potential data loss. What happens with the data we transfer at the other end?
  • 7. Counters and Gauges There are two base metric types. Gauges are a snapshot of current state, such as memory usage or temperature. They can go up and down. Counters are the other base type. To explain them, we need to go on a "small" detour.
  • 8. Events are key Events are the thing we want to track with Counters. An event might be a HTTP request hitting our server, a function call being made or an error being returned. An event logging system would record each event individually. A metrics-based system like Prometheus or Graphite has events aggregated across time before they get to the TSDB. Therein lies the rub.
  • 9. Approach #1: Resetting Count There's a few common approaches to providing this aggregate over time. The first and simplest is the resetting count. You start at 0, and every time there's an event you increment the count. On a regular interval you transfer the current count, and reset to 0.
  • 11. Approach #1: Resetting Count Problems If a transfer fails, you've lost that data. Can't presume this effect will be random and unbiased, e.g. if a big spike in traffic also saturates network links used for monitoring. Doesn't work if you want to transfer data to more than one place for redundancy. Each would get 1/n of the data.
  • 13. Approach #2: Exponential moving average A number of instrumentation libraries offer this, such as DropWizard's Meter. Basically the same way Unix load averages work: result(t) = (1 - w) * result(t-1) + (w) * events_this_period Where t is the tick, and w is a weighting factor. The weighting factor determines how quickly new data is incorporated. Dropwizard evaluates the above every 5 seconds.
  • 14. Exponential Moving Average, Normal Operation
  • 15. Approach #2: Exponential moving average Problems Events aren't uniformly considered. If you're transferring data every 10s, then the most recent 5s matter more. Thus reconstructing what happened is hard for debugging, unless you get every 5s update. You're bound to the 1m, 5m and 15m weightings that the implementation has chosen. Also means that it's not particularly resilient to missing a scrape.
  • 16. Aside: Graphite's summarize() function Summarize() returns events during e.g. the last hour. Some have a belief that summarize is accurate. It isn't. Problem is that with say 15m granularity, data point at 13:02 will include data from 13 minutes before 13:00-14:00 and similarly at the end. If you want this accurately, need to use logs. No metrics based system can report this accurately in the general case.
  • 17. Graphite's summarize() and non-aligned data
  • 18. Aliasing Depending on the exact time offsets between the process start, metric initialisation, data transfers and when the user makes a query you can get different results. A second in either direction can make a big difference to the patterns you see in your graphs. This is an expected signal processing effect, be aware of it.
  • 19. Expressive Power Both previous solutions are reasonable if the monitoring system is a fairly dumb data store, often with little math beyond addition (if even that). Losing data or having no redundancy are better than having nothing at all. What if you have the option for your monitoring system to do math? What if you control both ends?
  • 20. Approach #3: Prometheus Counters Like Approach #1, we have a counter that starts at 0 increments at each event. This is transferred regularly to Prometheus. It's not reset on every transfer though, keeps on increasing. Rate() function in Prometheus takes this in, and calculates how quickly it increased over the given time period.
  • 22. Approach #3: Prometheus Counters Resilient to failed transfers (lose resolution, not data) Can handle multiple places to transfer to Can choose the time period you want to calculate over in monitoring system, thus choose your level of smoothing e.g. rate(my_metric[5m]) or rate(my_metric[10m]) Uniform consideration of data Easy to implement on client side
  • 24. Prometheus Counters: Rate() There's many details to getting the rate() function right. Processes might restart Scrapes might be missed Time periods rarely align perfectly Time series stop and start
  • 25. Prometheus Counters: Resets Counters can be reset to 0, such as if a process restarts or a network switch gets rebooted. If we see the value decreasing, then that's a counter reset so presume it went to 0. So seeing 0 -> 10 -> 5 is 10 + 5 = 15 of an increase. Graphite/InfluxDB's nonNegativeDerivative() function would ignore the drop, report based on just the increase 10.
  • 26. Prometheus Counters: Missed scrapes If we miss a scrape in the middle of a time period, no problem as we still have the data points around it. Little more complicated around the edges of the time period we're looking at though.
  • 27. Prometheus Counters: Alignment It is rare that the user will request data exactly on the same cycle as the scrapes. Especially when you're monitoring multiple servers with staggered scrapes. Or given that timestamps are millisecond resolution, and the endpoint graphs use accepts only second-granularity input. Thus we need to extrapolate out to the end of the rate()'s range.
  • 28. increase() can return non-integers on integral data This is why one of the more surprising behaviours of increase() happens. So if we have data which is: t= 1 10 t= 6 12 t=11 13 Request a increase() from t=0 for 15s, you'll get an increase of 3 over 10s. Extrapolating over the 15s, that's a result of 4.5. This is the correct result on average. If you want exact answers, use logs.
  • 29. Non-integral increase due to extrapolation
  • 30. Prometheus Counters: Time series lifecycle Time series are created and destroyed. If we always extrapolated out to the edge of the rate() range we'd get much bigger results than we should. So we detect that. We calculate the average interval between data points. If the first/last data point start/end of the range is within 110% of the average interval, then we extrapolate to the start/end. Allows for failed scrapes. Otherwise we extrapolate 50% of the average interval. We also know counters can't go negative, so don't extrapolate before the point they'd be 0 at.
  • 32. Problem: Timeseries not always existing The previous logic handles all the edge cases around counters resets, process restarts and rolling restarts, on average. What if a counter appears with the value 1 though long after the process has started and doesn't increase again? No increase in the history, so rate() doesn't see it. Can't tell when the increase happened. Prometheus is designed to be available, not catch 100% of events. Solution: Logs, or make sure all your counters are being initialised on process start so it goes 0->1. Will only miss it prior to the first scrape then.
  • 33. Problem: Lag All these solutions produce results that lag the actual data - already seen with summarize(). A 5m Prometheus rate() at a given time, is really the average from 5 minutes ago to now. Similarly with resetting counters. Exponential moving averages more complicated to explain, same issue though. Always compare like with like, stick to one range for your rate()s.
  • 34. Client Implications The Prometheus Counter is very easy to implement, only need to increment a single number. Concurrency handling varies by language. Mutexes are the slowest, then atomics, then per-processor values - which the Prometheus Java client approximates. Dropwizard Meter has to increment 4 numbers and do the decay logic, so about 6x slower per benchmarks. Dropwizard Counter (which is really a Gauge, as it can go down) is as fast as Prometheus Counter.
  • 35. Other performance considerations Values for each label value (called a "Child") are in map in each metric. That map lookup can be relatively expensive (~100ns), keep a pointer to the Child if that could matter. Need to know the labels you'll be using in advance though. Similarly, don't create a map from metric names to metric objects. Store metric objects as pointers in simple variables after you create them.
  • 36. Best Practices Use seconds for timing. Prometheus values are all floats, so developers don't need to choose and deal with a mix of ns, us, ms, s and minutes. increase() function handy for display, but similarly for consistency only use it for display. Use rate() for recording rules. increase() is only syntactic sugar for rate().
  • 37. irate(): The other rate function Prometheus also has irate(). This looks at the last two points in the given range, and returns how fast they're increasing per second. Great for seeing very up to date data. Can be hard to read if data is very spiky. Need to be careful you're not missing data.
  • 40. Steps and rate durations The query_range HTTP endpoint has a step parameter, this is the resolution of the data returned. If you have a 10m step and 5m rate, you're going to be ignoring half your data. To avoid this, make sure your rate range is at least your step for graphs. For irate(), your step should be no more than your sample resolution.
  • 41. Compound Types: Summary How to track average latency? With two counters! One for total requests (_count), one for total latency of those requests (_sum). Take the rates, divide and you have average latency. This is how the compound Summary metric works. It's a more convenient API over doing the above by hand. Some clients also offer quantiles. Beware, slow and unaggregatable.
  • 42. Compound Types: Histogram Histogram also includes the _count and _sum. The main purpose is calculating quantiles in Prometheus. The histogram has buckets, which are counters. You can take the rate() of these, aggregate them and then use histogram_quantile() to calculate arbitrary quantiles. Be wary of cardinality explosion, use sparingly.
  • 43. Resources Official Project Website: prometheus.io User Mailing List: prometheus-users@googlegroups.com Developer Mailing List: prometheus-developers@googlegroups.com Source code: https://github.com/prometheus/prometheus/blob/master/promql/functions.go Robust Perception Blog: www.robustperception.io/blog