We are going to talk about Prometheus and how to use to monitor micro-services "Cloud-Native" application s. We are going to dive deep into the Prometheus monitoring model, we will see what are the components be hind this system and how they integrate with each others to provide an efficient and modern monitoring sy stem. We will also have a glance on Prometheus native integrations for cloud-native environments such as Kubernetes.
3. ~ ./stuff_I_poke_around_with
- Linux
- Kubernetes (clusters lifecycles and workloads scheduling in general)
- The CloudTM
(VMs and Containers + other people's computers)
- golang
- More devops toys FTW! (CI/CDs, Ansible, etc..)
5. Cloud-Native is NOT The CloudTM
At its root, Cloud Native is structuring teams, culture and
technology to utilize automation and architectures to manage
complexity and unlock velocity.
Joe Beda
6. There’s a copernican revolution happening on
infrastructures
A fundamental shift:
From VM-based Mutable
to Highly Dynamic and Immutable
infrastructures
12. Overview: What is Prometheus?
Community Driven Open-source
Monitoring and Alerting framework.
- Time series database for instrumentation,
metrics collection, storage and querying
- Alerting entity
- Integrated tools for metrics exposure
13. Overview: A bit of context around Prometheus
Started in 2012 as a SoundCloud
internal project
Second project to join CNCF after
Kubernetes
15. Core features
● Powerful no-sql query language, PromQL
● Time series data model
● Optimized to be efficient
● Operational & Architectural simplicity
21. Exporters & SDKs
Formatting metrics to be exported
in the expected prometheus
format
- Either exporters (Node, Rabbit,
Mysql, etc..)
- SDKs to export application
metrics
23. Prom Server configuration
- CLI flags for the immutable
daemon
- Config file defines scraping
targets, instances and jobs
24. Prom Server configuration
- CLI flags for the immutable
daemon
- Config file defines scraping
targets, instances and jobs
global:
scrape_interval: 1m
scrape_timeout: 30s
external_labels:
cluster: "test-cluster"
rule_files:
- rules/rules.yml
# Scraping targets
scrape_configs:
- job_name: 'some-service'
static_config:
- <host> or <dns>
labels:
app: "some-service"
prometheus.yml
25. /metrics
# HELP hash_seconds Time taken to create hashes
# TYPE hash_seconds histogram
hash_seconds_bucket{code="200",le="1"} 2
hash_seconds_bucket{code="200",le="2.5"} 2
hash_seconds_bucket{code="200",le="5"} 2
hash_seconds_bucket{code="200",le="10"} 2
hash_seconds_bucket{code="200",le="+Inf"} 2
hash_seconds_sum{code="200"} 9.370800000000002e-05
hash_seconds_count{code="200"} 2
26. Data model & querying
api_http_requests_total{method="POST", handler="/messages"}
- Labels based data model
- Each label and combination of labels is a dimension where we
can filter and aggregate exported data
- Changing, adding or removing a label will create a new time
series
27. PromQL & Label based queries
http_requests_total all time series related to the metric http_requests_total
http_requests_total{code="200",method="get"} time series related to successful request with
method get for the metric http_requests_total
http_requests_total{code="200",method="get"}[5m] returns a range vector
28. PromQL & Label based queries
http_requests_total{status!~"^4..$"}
Selecting all errors-related time series using
regexes
sum(rate(http_requests_total[5m])) by (job) Applying functions, in this case we sum over a
range vector and aggregating by job
31. Alerting
Rules
- Evaluated by the prometheus
server on a regular basis
- If a certain query matches a
condition, the alert is triggered
ALERT InstanceDown
IF up == 0
FOR 5m
LABELS { severity = "critical" }
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} down",
description = "{{ $labels.instance }} of job {{
$labels.job }} has been down for more than 5
minutes.",
}
Until Prometheus 1.8
This syntax has been changed to standard yaml starting
from Prometheus v2 (structure stays the same)
32. Alert Dispatching
Job of the alertmanager is to dispatch
alerts to the right channel according to
their severity
34. Service discovery
Scraping statically defined targets is not very useful
kubernetes_sd_config
Native integration for kubernetes environments
- Prometheus is aware of running in a kubernetes cluster
- Automatically retrieve scraping targets such as nodes, pods, containers from the
k8s API
36. Re-labeling
- Relabeling is a very powerful mechanism that allow us to further manipulate labels from the targets.
- It’s a very effective way to turn targets from an API and apply sophisticated targeting strategies (i.e.
manipulating addresses or ports, filtering a subset of targets, etc..)
A quick configuration example:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true