Prometheus
Multi-dimensional metrics
Clint Checketts
Prometheus - That Fire Guy
Baseball
Stats
Times at bat
Batting Average
Pitching speed
AssistsBases run
Home runs
Wins and Losses
Application Stats
Deploy speed
Endpoint latency
Database
performance
Error countsCache hits
Garbage
collection
Outages
Home Run Stats
Scouts vs Stat-heads
Multi-dimensional Metrics
#gifee
#gifee - Google Infrastructure for
Everyone Else
• Kubernetes (borg)
• Prometheus (borgmon)
• Open Tracing (Dapper)
Why Prometheus? Ownership
• To ensure that engineers
• have confidence in where the metrics are coming from,
• can minimize friction on creating dashboards they need,
• and improve alerts that affect them
• Types of Metrics
• Infrastructure
• Application Performance
• Feature Usage
Prometheus Pedigree
• Open Source
(Apache 2)
• Cloud Native
Computing
Foundation
• Inspired by Google
metrics ‘borgmon’
• Created Nov 2012
Prometheus Feature Set
• Metrics gathering
• Infrastructure exporters
• Application instrumentation
• Query language
• Alerting
• Graphing
Technical Summary
• Self contained, very easy to run
• Doesn’t use external DB
• Can run even if everything else is on fire
• Very efficient memory/disk usage
• Pull model for monitoring
• Service discovery model to determine what to
monitor (AWS, DNS, Kubernetes, etc)
• Keeps active series and queries in memory
• Memory usage dictates scaling model
Metrics Collection
Grafana Prometheus AWS
Kubernetes
Java Application
Graph Data
Metric Data
Service Discovery
Alerting
Prometheus
Alerting
Alert Manager
Routing
Exporters
Allows Prometheus to
scrape services that aren’t
Prometheus aware
Examples
Node Exporter
SNMP Exporter
MySQL Exporter
Jenkins Exporter
RabbitMQ Exporter
Pull Architecture
Application
Prometheus
Grafana
Kubernete
s
Graphs from
Collects from Discovers from
Grafana
 Graphing/Dashboarding service
 Templates
 Multiple query overlays (offset queries)
Metrics Types:
 counter - example total requests
 inc()
 guage - measure a value at a given time
 Inc() dec() set()
 histogram - quartiles with sample data
 Observe()
 summary - counts and totals
 startTimer() Observe()
How do I get this
goodness?
 Client Libraries
 Exporters
 Baked into tools
 Docker
 Kubernetes
Metric names and labels
 Names
 Explain what is being measured
 Include the unit (or ‘count’)
 Labels (Examples)
 Application name
 Quartile
 Endpoint
http_request_duration_sec{app=”api”, method=“get”,
quartile=“0.5”, handler=“/users”, statusCode=“200”} 0.2
PromQL examples
 All current request duration
 http_request_duration{app=“apiContetnt”}
 How many envs in a pool are available?
 env_pool_count{envtype=“domo/brief/master”} –
env_initing_count{envtype=“domo/brief/master”}
 What are my non-success request rates?
 sum(irate(http_request_duration_milliseconds_count{app=”a
piContent”, statusCode!=“200”}[1m])) by (statusCode) * 60
Summary
 Multi-dimensional
metrics are powerful
 Libraries support is
ready
 Go!

Prometheus - Utah Software Architecture Meetup - Clint Checketts

Editor's Notes