Prometheus Training

Prometheus Monitoring – Docker Enterprise Edition
Tim Tyler – Docker Captain
January 03, 2017
This is a training deck I originally developed in December 2016 and presented as part of a
company training plan for the Docker Enterprise Edition platform.
As of this edit it is 2019 and substantially dated – that is all of the tool stack has moved
forward significantly as well as some of the nuts and bolts originally described here (for
instance its very easy to not need HAProxy as described in favor of Interlock) I’ve decided to
share it – as it does have some intrinsic value remaining and can form the basis for an
updated and modernized version and potential MeetUp talk.
I’ve removed about 10% of the original content that was company specific or proprietary,
leaving only publicly available detail, and obscured some data. Many of the images worked
better on a white background, and rather than fiddle too much with them I’ve just applied
some quick picture styles.
It would be very easy to base an updated tech stack on this document and install a portable
training system on a Raspberry Pi. I am currently building a Prometheus and Grafana based
system for monitoring, alerting, and visualizing my Samsung SmartThings home automation
on a spare iMac.
@timotyler
ttyler

3
Who’s Keeping An Eye On Your
Containers?

 Monitoring Stack Overview
 Prometheus
 Exporters
 Alertmanager
 Queries
 Alerts
Nuts and Bolts
Questions
4
Agenda

Monitoring Stack Overview
5
I'm still passionately interested in what my fellow humans are up to. For me, a day
spent monitoring the passing parade is a day well-spent. - Garry Trudeau

 Monitoring containerized and microservices environments present new challenges.
 Containers can be highly ephemeral
 Microservervices are able to scale up and down to meet design and performance criteria
 Microservices may exist for seconds, or persist indefinitely
 Microservices are generally a single process
 Containers live on hosts, but hosts are just pooled resources
 Generally we don’t think about what host an application microservice is running on
 Instances of a microservice may live on multiple hosts in a Docker Swarm
 Instances of a microservice may move to different hosts within a Docker Swarm
 The Swarm is a pool, and the microservices just swim in it
 Monitoring, like the microservice architecture, needs to be elastic
6
What’s the Problem?

 We have options, and several are readily available
 Prometheus
Time series dimensional data model with Docker aware agents
 Dynatrace
Specialist in application performance monitoring with Docker support
 SignalFX
Newer offering with native Docker support
 Sysdig
Swiss army knife for infrastructure and microservices monitoring
7
Can We Solve This?

 Prometheus is a Pitts S-2A RC muscle biplane
 Prometheus is a prequel and fifth installment in the Alien franchise
 Prometheus is a Greek Titan that gave us fire and suffered an unfortunate fate
involving a hungry eagle and his liver
 Prometheus is a leading Open Source monitoring solution
Prometheus is straightforward to implement as a primary cluster monitoring
stack
A complete stack can also include the Open Source data visualization tool
Grafana
8
PROMETHEUS!
What’s Prometheus?

 Dimensional Data Metric Collector
 Interactive Query Engine
 Calculator for discrete multidimensional data streams
 Great Visualization
 Efficient Storage
 Simple Operation
 Alerting
 Many Client Libraries
 Many Integrations
9
PROMETHEUS!
But what does it do?

 Prometheus Server
 Scrapes and stores time series data
 Alertmanager
 Handles alerts generated by Prometheus Server, deduplicating, grouping, and routing alerts to configured receivers
 AM-Exporter
 Receiver to transmit alerts from Alertmanager to custom intake process
 Exporters
 Agents with specific duties that collect metrics and present them to Prometheus Server
 cAdvisor, node-exporter, blackbox-exporter
 Grafana
 Data visualization
 HaProxy
 Routes calls to Prometheus Server, Alertmanager, and Grafana within the Docker overlay network
10A typical Prometheus Monitoring Stack

 Infrastructure as Code
 IaC is to treat the configuration of systems the same way that software code is treated
 We’re all devs now
 Automate and modularize
 Apply test pyramid
 Version control changes, patches, and releases
 Share work! (Because DevOps)
 Installed via Docker orchestration and some basic automation
 Makefile driven
 Apply environment specific customizations (hostnames, passwords, alerts, etc.) to config files
 Deploy configs across cluster
13
Stack Installation

15
Why should the thirst for knowledge be
aroused, only to be disappointed and
punished? Yet, like a second
Prometheus, I will endure this and worse
- Edwin Abbott in Flatland: A Romance of Many
Dimensions (1884)

 Open Source systems monitoring and alerting tool originally
built at SoundCloud
 Very active developer and user community
 Docs and stuff
https://prometheus.io/docs/introduction/overview/
16
Prometheus Server

 Collect and store time series data
 Scrape defined targets for functionally specific data
 Discover targets statically or dynamically
 Evaluate rulesets
 Allow vector arithmetic
 Send alerts
17
What can Prometheus Server Do?

18
Prometheus Has a Really Boring UI
We’ll go poke around for a minute

Prometheus Exporters are basically agents that are responsible for
collecting application specific, time series, metrics and presenting
them via an API endpoint for Prometheus to collect.
20
What Are Exporters?

Prometheus has support either directly, or via third parties, for
dozens of exporters. Some tools have been directly instrumented
to provide a Prometheus endpoint such as etcd, cAdvisor,
Kubernetes, and Docker.
Custom, business specific, exporters can be easily written in any
language, however Go seems popular.
21
A Bunch of Exporters

A basic Docker monitoring stack implements 3 exporters
 cAdvisor
 Provides metrics on docker and container environment
 node-exporter
 exporter for hardware and OS metrics exposed by the kernel
 blackbox-exporter
 allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP.
22
What is a Minimal Set of Exporters?

 Pushgateway
 allow ephemeral and batch jobs to expose their metrics
 HAProxy-exporter
periodically scrapes HAProxy stats and exports them via HTTP/JSON for Prometheus
 JMX-exporter
configurably scrape and expose mBeans of a JMX target
 Mongodb-exporter
 Rabbitmq-exporter
23
What are Some Other Exporters?

Prometheus provides a functional language that lets the user select and
aggregate time series data in real time. Results can be rendered as
follows:
 Displayed in a graph
 Viewed as tabular data
 Consumed by external systems
Grafana for instance
28
The Basics

 Instant vector
A set of time series data containing a single sample for each series
 Range vector
A set of time series data containing a range of data points over time
 Scalar
A simple numeric floating point value
29
Data Types
Prometheus has 3 basic data types

30
Operators
Prometheus supports basic logical and arithmetic operators
Arithmetic Operators Comparison Operators Aggregation Operators
+ (addition)
- (subtraction)
* (multiplication)
/ (division)
% (modulo)
^ exponentiation)
== (equal)
!= (not equal)
> (greater than)
< (less than)
>= (greater or equal)
<= (less or equal)
sum
min
max
avg
count
topk

 sum
 count
 irate
 sort
 topk
 time
31
Functions
Prometheus supports about 40 built in functions

sort_desc(
topk(5,
sum by (image) (
irate(container_cpu_usage_seconds_total {
id=~"/docker/.*"}[5m]
)
)
)
)
33
To edit go to: Insert > Header and Footer
Fancy Query
Top 5 Docker Images by CPU

35
Big things have small beginnings –
David, from the movie Prometheus
(2012)
Lets build an Alert!

 Alerts are just queries with comparison operators
 Alerts are written in a simple format in a plain text file
 Alerts can be decorated with interesting metadata
 Alert metadata can be templated
 Alerts can be sent to an external service
36
First Things First

37
The Anatomy of an Alert
An alert starts with a Query – like up

38
This is more info than we want though

39
What we really want is to count how many we have

40
Or change how we count them

41
And do some math

42
Check out a quick chart

43
This is more fun

ALERT NodeDown
IF up{job="node"} == 0
FOR 1m
LABELS {prdcode=“0000", host=“Shared_Infra", severity="critical", support="Prometheus_Critical"}
ANNOTATIONS {
description="{{$labels.instance}} of job {{$labels.job}} has been down for more than 1 minutes.",
rosguide="Please see Application ROS guide",
summary="Instance {{$labels.instance}} down“
}
44
And go back and turn our earlier query into an alert

 OptimusPrime (bot)3:32 PM
 AlertManager message: [FIRING:1] NodeDown (0000 prod
node.metrics Shared_Infra node app critical
Prometheus_Critical). Learn more at
https://somewhere.dockeralerts.company.com:8443/#/alert
s?receiver=ChatBot
45
A NodeDown Alert Sent To Chat
Fate rarely calls on us at a moment of our choosing – Optimus Prime

 Rules/Alerts are segregated into functionally specific rule files
 alert.rules
 basic alert installed with 1 rule ‘IF up{job="node"} == 0’
 alert.infra.logging.rules
 Logging ruleset
 alert.infra.monitoring.rules
 Monitoring stack rules
 alert.infra.rules
 Basic infrastructure rules such as file systems, memory, and thinpool
 alert.service.app.prod.rules
 Service level rules such as redis, mongodb, rabbitmq, etc.
 alert.docker.rules
 Rules for Docker itself
 alert.0000.app.rules
 Application specific rules
46
How are Rules/Alerts Categorized?

 Grafana is a leading Open Source Data Visualization Tool
 Create and share intuitive dashboards
 Rich graphing and charting
 Mixed styling within a dashboard
 Dashboard templates
 Lots of additional features
48
What is Grafana?

The Infrastructure Monitoring Stack is currently considered v1.0
 Prometheus v1.3.1
 Grafana v3.1.1
 Alertmanager v0.4.2 custom-v2
 HaProxy v1.6.9
 cAdvisor v0.24.1
 Node-exporter v0.12.0
 Blackbox-exporter v0.2.0
51
Whats in Your Stack?

We use Git to manage configurations and changes to the tech stack. Git is a distributed
version control system.
 Simple to use
 Enables code collaboration
 Eases deployments
 https://somewhere.company.com/git/projects/PRJ0000/repos/infra-prom-
stack/browse
52
Tech Stack SOA

The Monitoring Stack is deployed and configured from 1 location in each Docker Swarm, this is typically on the first Docker
Master Node.
 Configuration files
 /company/compose/infra-prom-stack
 Prometheus Configuration
 /company/compose/infra-prom-stack/infra/prometheus/config/prometheus.yml
 Alertmanager Configuration
 /company/compose/infra-prom-stack/infra/prometheus/config/alertmanager.conf
 Alert Files
 /company/compose/infra-prom-stack/infra/prometheus/alerts
53
Basic Stack Deployment

The Makefile simplifies stack management by reducing error prone commands to simple make targets. It is used to both
configure and install the Monitoring Stack, and to manage the stack during runtime. Some examples:
 make pushconfigs-all
 Distributes configuration to all Swarm nodes
 make hup-prometheus
 Gently restarts Prometheus Server after a configuration change
 make start
 Equivalent to a `docker compose up` with cluster specific information
 make start-all
 Starts the stack and scales all required services
54
Controlling the Stack

These commands are run from the /company/compose/infra-prom-stack on the first Master Node
 There are 1 or more cAdvisor containers down
 Restart via UCP
 If that fails remove the stopped containers
 Run `make scale-cadvisor` from /company/compose/infra-prom-stack
 There are 1 or more node-exporter containers down
 Restart via UCP
 If that fails remove the stopped containers
 Run `make scale-node-exporter` from /company/compose/infra-prom-stack
 Cannot connect to Prometheus Server, Grafana, or Alertmanager
 Validate they are up via UCP
 Occasionally HAProxy seems to get confused and needs a simple restart via UCP
55
Fixing Some Basic Problems

Infrastructure Monitoring and Logging services are currently
deployed as shared infrastructure services in a Docker Overlay
network.
 Overlay name: infra_netmon
Monitoring stack
Logging stack
58
Network Overlay and Shared Services

Prometheus is Federated, enabling existing Prometheus Servers to monitor other Prometheus Servers.
 north-nonprod monitors both
 east
 west
 east monitors
 west
 west monitors
 east
 Basic synthetic monitoring
59
Federation
Who monitors the monitors?

If we stick with Prometheus then there are several improvements that will need exploration and engineering
 Integrate configuration and deployment via a CI/CD pipeline
 Improve and refine Rules/Alerts
 Update Prometheus Server to latest version
 Not much to gain here at the moment
 Update Grafana to latest version
 Some interesting new features including built in alerts
 Back Grafana with a relational database
 Enables persistent annotations
 Engineer HA Prometheus and Alertmanager within a cluster
 Figure out a better persistent storage strategy
 This is bigger than Prometheus/Monitoring
60
Future Work

Since this is an Open Source solution we will have new tradeoffs vs. a fully vendored solution. The following resources are suggested for those
wanting to dive deeper into this technology stack.
 See the Prometheus docs, GitHub repo, YouTube videos, and Robust Perception blog
 https://prometheus.io/docs/introduction/overview/
 https://github.com/prometheus/prometheus
 https://www.youtube.com/watch?v=gNmWzkGViAY&t
 https://www.robustperception.io/blog/
 See the Grafana docs, GitHub repo, and Screencasts
 http://docs.grafana.org/
 https://github.com/grafana/grafana
 https://www.youtube.com/playlist?list=PLDGkOdUX1Ujo3wHw9-z5Vo12YLqXRjzg2
 See the cAdvisor GitHub repo
 https://github.com/google/cadvisor
61
Want to Learn More?

 Microservices are (intended to be) ephemeral
 We need to monitor potentially transient services and act accordingly
 This is an Open Source solution down the stack
 Prometheus is targeted to replace existing on-prem roles
Capable of very basic synthetics
Can set up service level monitoring for mongodb, rabbitmq, etc(d).
 Interface with 3rd party connectors
 Alerts are easy to create and manage
 Deployed as Infrastructure as Code
Embrace DevOps
62
Key Points

64
I Hope This Isn’t You Right Now

Prometheus Training

More Related Content

What's hot

Similar to Prometheus Training

Recently uploaded

Prometheus Training