Monitoring containerized apps in a dynamic cloud environment presents a unique set of challenges that is not easily solved with traditional monitoring systems. Prometheus: a powerful and multi-dimensional monitoring tool and with over 10 million Docker pulls is gaining huge traction within the Kubernetes community.
This talk covers:
• An introduction to Kubernetes, followed by a discussion of the benefits of using Prometheus monitoring.
• An overview of a big announcement at DockerCon: how the Weaveworks team worked with Docker to make Prometheus work with Docker Swarm. Luke will discuss the how and why of the process and what you need to know.
• An overview of the different types of whitebox/blackbox monitoring, their pros and cons, and why the Prometheus pull model is beneficial.
• A discussion of the Prometheus data model and how PromQL (the Prometheus Query Language) can help you monitor your app in a dynamic system.
• We'll turn theory into practice by digging into a real performance problem in our sample microservices app, the Sock Shop.
Visit Weave Cloud: https://www.weave.works/product/cloud/
For more free talks, join our Weave Online User Group: https://www.meetup.com/Weave-User-Group/
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Monitoring your Application in Kubernetes with Prometheus
1. Monitoring your App in
Kubernetes with Prometheus
Jeff Hoffer, Developer Experience
github.com/eudaimos
2. What does Weave do?
Weave helps devops
iterate faster with:
• observability &
monitoring
• continuous delivery
• container networks &
firewalls
Use Prometheus to
power our Monitoring
solution
3. What does Weave do?
Weave helps devops
iterate faster with:
• observability &
monitoring
• continuous delivery
• container networks &
firewalls
Use Prometheus to
power our Monitoring
solution
4. Agenda
1. Prometheus concepts: data model & PromQL
2. Prometheus architecture & pull model
3. Why Prometheus & Kubernetes are a good fit
4. What is Cortex?
5. Kubernetes recap
6. Training on real app
7. What’s next?
6. Data model & PromQL
• Prometheus is a labelled time-series database
• Labels are key-value pairs
• A time-series is [(timestamp, value), …]
• lists of timestamp, value tuples
• values are just floats – PromQL lets you make sense of them
• So the data type of Prometheus is
• {key1=A, key2=B} —> [(t0, v0), (t1, v1), …]
• …
7. Metrics Types
• count - single numeric metric that only goes up
• gauge - single numeric metric that arbitrarily goes up or down
• histogram - samples observations and counts them in
configurable buckets
• histogram_quantile() to calculate quantiles from histograms
or aggregations
• summary - samples observations and counts them
• configurable quantiles over sliding time windows
{quantile=“.5“}
8. Data model & PromQL
• __name__ is a magic label, you can
shorten the query syntax from
{__name__=“requests”}
to:
requests
9. Data model & PromQL
• Example: counter requests over a spike in traffic:
• 1, 2, 3, 13, 23, 33, 34, 35, 36
time
requests
1
3
13
23
33
36
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
10. Data model & PromQL
• What Prom is storing
• {__name__=“requests”} —>
[(t1, 1), (t2, 2), (t3, 3), (t4, 13),
(t5, 23), (t6, 33), (t7, 34), (t8, 35),
(t9, 36), (t10, 37)]
or
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
11. Data model & PromQL
• the [P] (period) syntax after a label turns an
instant type into a vector type
• for each value, turn the value into a vector
of all the values before and including that
value for the last period P
• Example P: 5s, 1m, 2h…
12. Data model & PromQL
• Recall our time-series requests
• What is requests[3s]? Vector query:
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1
2
3
13. Data model & PromQL
• Recall our time-series requests
• What is requests[3s]? Vector query:
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1 2
2 3
3 13
14. Data model & PromQL
• Recall our time-series requests
• What is requests[3s]? Vector query:
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1 2 3
2 3 13
3 13 23
16. Data model & PromQL
• rate() finds the per second rate of change
over a vector query
• for each vector rate() just does (last_value
- first_value) / (last_time - first_time)
39. Labels
• Recall that requests is just shorthand for
{__name__=“requests”}
• We can have more labels: {__name__=“requests”,
job=“frontend”}
• Shortens to requests{job=“frontend”}
• And so we could query
rate(requests{job=“frontend”}[1m])
40. Label Operators
• = -> exact match string
• != -> exact match string negated
• =~ -> regex match label
• !~ -> regex match negated
• Regex matching is slower b/c Prometheus
can’t use indexes
43. Alerts
• You can define PromQL queries that trigger alerts when
the result of a query matches a criteria. Example:
# Alert for any instance that have a median request latency >1s.
ALERT APIHighRequestLatency
IF api_http_request_latencies_second{quantile="0.5"} > 1
FOR 1m
ANNOTATIONS {
summary = "High request latency on {{ $labels.instance }}",
description = "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)",
}
44. Cortex
• Distributed, multi-tenant version of
Prometheus
• Prometheus architecture is single-server
• We wanted to build something scalable
46. Cortex
• We run it for you
• Long term storage for your metrics
• We open sourced it
• https://github.com/weaveworks/cortex
47. Recap: all you need to know (Kube)
Pods
containers
ServicesDeployments
Container
Image
Docker container image, contains your application code in an isolated
environment.
Pod A set of containers, sharing network namespace and local volumes, co-
scheduled on one machine. Mortal. Has pod IP. Has labels.
Deployment Specify how many replicas of a pod should run in a cluster. Then ensures that
many are running across the cluster. Has labels.
Service Names things in DNS. Gets virtual IP. Two types: ClusterIP for internal
services, NodePort for publishing to outside. Routes based on labels.
49. Why Kubernetes <3 Prometheus
• Prom discovers what to scrape by asking Kube
• Prom’s pull model matches Kube dynamic
scheduling
• Allows Prom to identify thing it’s pulling from
• Prom label/value pairs mirror Kube labels
• Pods were made for exporters
51. Join the Weave user group!
meetup.com/pro/Weave/
weave.works/help
52. Other topics
Kubernetes 101
Continuous delivery: hooking up my CI/CD pipeline
to Kubernetes
Network policy for security
We have talks on all these topics in the Weave
user group!
53. Thanks! Questions?
We are hiring!
DX in San Francisco
Engineers in London & SF
weave.works/weave-company/hiring
Editor's Notes
it’s like numerical differentiation
you can think of [3s] like a “smoothing factor”
allow you to miss traces
note that prometheus pulls metrics from jobs/exporters
this is your app/cluster/network/nodes (anything that can be instrumented)
note that prometheus pulls metrics from jobs/exporters
this is your app/cluster/network/nodes (anything that can be instrumented)
allows Prom to know identity of thing it’s pulling from