Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Monitoring Cloud-Native applications with
Prometheus
Jacopo Nardiello
CODEMOTION MILAN - SPECIAL EDITION
10 – 11 NOVEMBER 2017

Jacopo Nardiello
SIGHUP Founder & DevOps Engineer
@jnardiello
~ whoami

~ ./stuff_I_poke_around_with
- Linux
- Kubernetes (clusters lifecycles and workloads scheduling in general)
- The CloudTM
(VMs and Containers + other people's computers)
- golang
- More devops toys FTW! (CI/CDs, Ansible, etc..)

What is exactly “Cloud-Native”?

Cloud-Native is NOT The CloudTM
At its root, Cloud Native is structuring teams, culture and
technology to utilize automation and architectures to manage
complexity and unlock velocity.
Joe Beda

There’s a copernican revolution happening on
infrastructures
A fundamental shift:
From VM-based Mutable
to Highly Dynamic and Immutable
infrastructures

The path to Cloud-Native Architectures

Why Containers
- A new infrastructural unit
- Atomic deployments
- Very small footprint, superfast scaling

Why Orchestrators
- Sandboxed environment
- Computers take over the scheduling
- Automatic Healthchecks and self-healing

Prometheus
Cloud-Native monitoring with

Overview: What is Prometheus?
Community Driven Open-source
Monitoring and Alerting framework.
- Time series database for instrumentation,
metrics collection, storage and querying
- Alerting entity
- Integrated tools for metrics exposure

Overview: A bit of context around Prometheus
Started in 2012 as a SoundCloud
internal project
Second project to join CNCF after
Kubernetes

Overview: Focus
Operational systems monitoring
Dynamic cloud environments

Core features
● Powerful no-sql query language, PromQL
● Time series data model
● Optimized to be efficient
● Operational & Architectural simplicity

Pull
/metrics endpoints
Monitoring model: Pull

The Architecture behind Prometheus

Prometheus core
- Service discovery and targets
definition
- Metrics scraping
- Time series database
- Alerts and Recording rules
- Alerting evaluation
- Metrics query

Alertmanager
- Alerting & silencing
- Dispatching notification to
different channels

Exporters & SDKs
Formatting metrics to be exported
in the expected prometheus
format
- Either exporters (Node, Rabbit,
Mysql, etc..)
- SDKs to export application
metrics

Prom Server configuration
- CLI flags for the immutable
daemon
- Config file defines scraping
targets, instances and jobs

Prom Server configuration
- CLI flags for the immutable
daemon
- Config file defines scraping
targets, instances and jobs
global:
scrape_interval: 1m
scrape_timeout: 30s
external_labels:
cluster: "test-cluster"
rule_files:
- rules/rules.yml
# Scraping targets
scrape_configs:
- job_name: 'some-service'
static_config:
- <host> or <dns>
labels:
app: "some-service"
prometheus.yml

/metrics
# HELP hash_seconds Time taken to create hashes
# TYPE hash_seconds histogram
hash_seconds_bucket{code="200",le="1"} 2
hash_seconds_bucket{code="200",le="2.5"} 2
hash_seconds_bucket{code="200",le="+Inf"} 2
hash_seconds_sum{code="200"} 9.370800000000002e-05
hash_seconds_count{code="200"} 2

Data model & querying
api_http_requests_total{method="POST", handler="/messages"}
- Labels based data model
- Each label and combination of labels is a dimension where we
can filter and aggregate exported data
- Changing, adding or removing a label will create a new time
series

PromQL & Label based queries
http_requests_total all time series related to the metric http_requests_total
http_requests_total{code="200",method="get"} time series related to successful request with
method get for the metric http_requests_total
http_requests_total{code="200",method="get"}[5m] returns a range vector

PromQL & Label based queries
http_requests_total{status!~"^4..$"}
Selecting all errors-related time series using
regexes
sum(rate(http_requests_total[5m])) by (job) Applying functions, in this case we sum over a
range vector and aggregating by job

Visualization
Plotting and graphing are out of prometheus
scope.
Use Grafana

Alerting
Rules
- Evaluated by the prometheus
server on a regular basis
- If a certain query matches a
condition, the alert is triggered
ALERT InstanceDown
IF up == 0
FOR 5m
LABELS { severity = "critical" }
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} down",
description = "{{ $labels.instance }} of job {{
$labels.job }} has been down for more than 5
minutes.",
}
Until Prometheus 1.8
This syntax has been changed to standard yaml starting
from Prometheus v2 (structure stays the same)

Alert Dispatching
Job of the alertmanager is to dispatch
alerts to the right channel according to
their severity

Service discovery
Scraping statically defined targets is not very useful
kubernetes_sd_config
Native integration for kubernetes environments
- Prometheus is aware of running in a kubernetes cluster
- Automatically retrieve scraping targets such as nodes, pods, containers from the
k8s API

More integrations (many more…)
- ec2_sd_config
- azure_sd_config
- openstack_sd_config
- gce_sd_config
- kubernetes_sd_config
- consul_sd_config
- dns_sd_config
- file_sd_config
- marathon_sd_config
- nerve_sd_config
- triton_sd_config
- static_config

Re-labeling
- Relabeling is a very powerful mechanism that allow us to further manipulate labels from the targets.
- It’s a very effective way to turn targets from an API and apply sophisticated targeting strategies (i.e.
manipulating addresses or ports, filtering a subset of targets, etc..)
A quick configuration example:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true

Thank you,
Questions?
We are hiring!
jacopo@sighup.io
@jnardiello

Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Similar to Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017 (20)

More from Codemotion

More from Codemotion (20)

Recently uploaded

Recently uploaded (20)

Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017