Brian Brazil
Founder
Prometheus for
Monitoring Metrics
Who am I?
● One of the core developers of Prometheus
● Founder of Robust Perception
● Contributor to many open source projects
● Ex-Googler, after 7 years in the Dublin office
Why monitor?
● Know when things go wrong
○ To call in a human to prevent a business-level issue
● Be able to debug and gain insight
● Trending to see changes over time, and drive
technical/business decisions
● To feed into other systems/processes
Services have Internals
Monitor the Internals
Monitor as a Service, not as Machines
What is Prometheus?
Metrics monitoring system (not logs).
A time series database. A query language.
Client libraries. An Ecosystem.
A modern approach to monitoring services.
Architecture
Client Libraries
Instrument your code to capture the metrics that
matter to you.
If upstream libraries are instrumented, you get that
for free!
Also many exporters, cAdvisor, MySQL, MongoDB,
SNMP, JMX, HAProxy, Minecraft, Factorio...
The PromQL Query Language
Arbitrary aggregation, joins and slicing all possible.
Can calculate how close you'll be to your quota in 4
hours, or the 95th percentile latency across an entire
datacenter.
If you can graph it, you can alert on it!
Analytics: Top 5 Docker images by CPU
topk(5,
sum by (image)(
rate(container_cpu_usage_seconds_total{
id=~"/system.slice/docker.*"}[5m]
)
)
)
Alert management
Not every alert results in a page.
Group similar alerts together, route them to the right
team and throttle notifications.
Designed to work reliably during network partitions.
Monitoring Approach
Service management went from manual to Chef to
Kubernetes. Need to do the same for monitoring.
Care about what matters to end users, such as
latency and error rates.
Distracting a human with alerts for everything that's
vaguely off only leads to burnout.
A Rich Community
Today there are 300+ contributors to the core
repositories, and 200+ 3rd party integrations.
There are 800+ subscribers on our mailing lists,
600+ people in IRC and an estimated 500+
companies using Prometheus in production.
Many companies funding Prometheus development.
Does Anyone I Know Use It?
You might have heard of a company called
"Percona"
The Percona Monitoring and Management tool is
based on Prometheus.
Also CloudFlare, Google, Digital Ocean, HBC, etc.
What is Prometheus?
Metrics monitoring system (not logs).
A time series database. A query language.
Client libraries. An Ecosystem.
A modern approach to monitoring services.
Resources
Official Project Website: prometheus.io
User Mailing List: prometheus-users@googlegroups.com
Dev Mailing List: prometheus-developers@googlegroups.com
IRC: #prometheus on chat.freenode.net
Robust Perception Blog: www.robustperception.io/blog

Prometheus for Monitoring Metrics (Percona Live Europe 2017)

  • 1.
  • 2.
    Who am I? ●One of the core developers of Prometheus ● Founder of Robust Perception ● Contributor to many open source projects ● Ex-Googler, after 7 years in the Dublin office
  • 3.
    Why monitor? ● Knowwhen things go wrong ○ To call in a human to prevent a business-level issue ● Be able to debug and gain insight ● Trending to see changes over time, and drive technical/business decisions ● To feed into other systems/processes
  • 4.
  • 5.
  • 6.
    Monitor as aService, not as Machines
  • 7.
    What is Prometheus? Metricsmonitoring system (not logs). A time series database. A query language. Client libraries. An Ecosystem. A modern approach to monitoring services.
  • 8.
  • 9.
    Client Libraries Instrument yourcode to capture the metrics that matter to you. If upstream libraries are instrumented, you get that for free! Also many exporters, cAdvisor, MySQL, MongoDB, SNMP, JMX, HAProxy, Minecraft, Factorio...
  • 10.
    The PromQL QueryLanguage Arbitrary aggregation, joins and slicing all possible. Can calculate how close you'll be to your quota in 4 hours, or the 95th percentile latency across an entire datacenter. If you can graph it, you can alert on it!
  • 11.
    Analytics: Top 5Docker images by CPU topk(5, sum by (image)( rate(container_cpu_usage_seconds_total{ id=~"/system.slice/docker.*"}[5m] ) ) )
  • 12.
    Alert management Not everyalert results in a page. Group similar alerts together, route them to the right team and throttle notifications. Designed to work reliably during network partitions.
  • 13.
    Monitoring Approach Service managementwent from manual to Chef to Kubernetes. Need to do the same for monitoring. Care about what matters to end users, such as latency and error rates. Distracting a human with alerts for everything that's vaguely off only leads to burnout.
  • 14.
    A Rich Community Todaythere are 300+ contributors to the core repositories, and 200+ 3rd party integrations. There are 800+ subscribers on our mailing lists, 600+ people in IRC and an estimated 500+ companies using Prometheus in production. Many companies funding Prometheus development.
  • 15.
    Does Anyone IKnow Use It? You might have heard of a company called "Percona" The Percona Monitoring and Management tool is based on Prometheus. Also CloudFlare, Google, Digital Ocean, HBC, etc.
  • 16.
    What is Prometheus? Metricsmonitoring system (not logs). A time series database. A query language. Client libraries. An Ecosystem. A modern approach to monitoring services.
  • 17.
    Resources Official Project Website:prometheus.io User Mailing List: prometheus-users@googlegroups.com Dev Mailing List: prometheus-developers@googlegroups.com IRC: #prometheus on chat.freenode.net Robust Perception Blog: www.robustperception.io/blog