DevOpsDays Phoenix 2018: Using Prometheus and Grafana for Effective Service Dashboards

Using Prometheus and
Grafana together for
Effective Service
Dashboards

audience_experience gauge
– Prometheus Query

– Grafana dashboard

– Grafana dashboard
– Grafana dashboard variables

Client library
Exporter daemon

Client library
Exporter daemon
Blackbox probe

●
Who all has been counting?
●
How high have they gotten?
●
How fast have they been counting up over
time?
●
When and who do we have time series for?

●
accesses_total{who=”prometheus”, page=”home”} 1
accesses_total{who=”bob”, page=”home”} 1
accesses_total{who=”prometheus”,page=”about”} 2

●
●
accesses_total 4

●
●
accesses_total 4
●
accesses_total{who=”prometheus”} 3
●
accesses_total{page=”about”} 2

●
●
accesses_total 4
●
accesses_total{who=”prometheus”} 3
●
accesses_total{page=”about”} 2
●
accesses_total{page=”about”} / ignoring(page)
accesses_total

Why not just Prometheus?
When we want to combine our queries and
interact visually

Why not just Prometheus?
Can query the API and throw the results in a table, or
better a chart using one of the myriad of libraries
available.
Grafana is a great place to start, even just to help
you get along while you work on your custom
solutions.
https://github.com/grafana/grafana/issues/7795
https://stackoverflow.com/questions/50033085/how-t
o-draw-a-network-diagram-in-grafana

Grafana
●
Grafana likes to query things too
– But not periodically like Prometheus
– Only queries to render your charts
●
Grafana doesn’t record the data,
instead it saves the dashboards that query the
data

Grafana
●
Multiple charts coming together to make
dashboards
●
Interactive, coordinated charts
●
Variables to make queries dynamic

Not all metrics are equal
How high something is right now is different
from how fast it has risen over time.

Instant vs Ranges
●
Prometheus represents this as instant queries
versus range queries.
●
Grafana represents this as the "Instant"
checkbox on a Prometheus metric on a chart on
a dashboard.
●
This can affect certain graphs – e.g. a table is
likely to use Instant values, a graph is likely to
use Ranges, but esoteric graphs might be less
intuitive.

How fast?
●
Nyquist Rate
●
https://en.wikipedia.org/wiki/Nyquist–Shannon_s
ampling_theorem
IntervalScrape

Types of Numbers
●
Counters
●
Gauges
●
Histograms
●
Summaries
●
https://prometheus.io/docs/concepts/metric_types/

●
Min/Max are usually safe and usually the most
helpful values to look at
●
Averages can be tricky.
– https://prometheus.io/docs/practices/histogra
ms/#errors-of-quantile-estimation
– http://highscalability.com/blog/2015/10/5/your
-load-generator-is-probably-lying-to-you-take-
the-red-pi.html

Effective Grafana
●
Handles multiple Datasources

Effective Grafana
●
●
Adapts query to given Datasource

Effective Grafana
●
●
●
Offers visually intuitive alerting

Logs -> Metrics References
●
https://github.com/braedon/prometheus-es-exp
orter
●
https://github.com/google/mtail
●
https://github.com/fluent/fluent-plugin-prometh
eus
– https://github.com/fluent/fluent-plugin-prome
theus/issues/16
●
Discussion of fluent-plugin vs mtail
●
https://github.com/fstab/grok_exporter
●
https://github.com/influxdata/telegraf

Effective Grafana
●
●
●
Offers visually intuitive Alerting
●
Combine Datasources

Effective Grafana
●
●
●
●
Combine Datasources
●
Link Dashboards

Effective Grafana
●
●
●
●
Combine Datasources
●
Link Dashboards
●
Provides Dashboard Variables

Variables
●
Filtering
●
Repeating Charts and
Repeating Rows

Variables
●
Filtering
●
Repeating Rows
●
Can be Hidden

Variables
●
Filtering
●
Repeating Rows
●
Can be Hidden
●
Define as static value, or the results of complex
queries.

Variables
●
Filtering
●
Repeating Rows
●
Can be Hidden
●
Define as static value, or the results of complex
queries.
●
Can reflect what is currently in selected time
range; or be global lists

Noisy Dashboards
●
Try to remove or aggregate away highest
cardinality, least important dimensions

Effective Variables
●
Enable compaction

Effective Variables
●
Enable compaction
●
Extra linking points between dashboards

Effective Variables
●
Enable compaction
●
– Requires both dashboards to use same
Variable with same name, but can drive a
powerful user experience

Effective Variables
●
Enable compaction
●
– Requires both dashboards to use same
Variable with same name, but can drive a
powerful user experience
●
Queries can reference other Variables

Prometheus and
Multi-select Variables
●
With Prometheus specifically, when using Multi-
select or All, in your queries..
Generally you will use .* or .+
{mything=”$variable”} BAD
{mything=~”$variable”} GOOD

Latest Grafana Changes
●
Newer versions of Grafana default to sane
Prometheus Multi-select values
– Still need to use =~ in your queries
●
Global variables for referencing currently
selected time range in queries
– Enables some really cool top-N graphing
capabilities
http://docs.grafana.org/features/datasources/pr
ometheus/#using-interval-and-range-variables
https://www.robustperception.io/graph-top-n-tim
e-series-in-grafana

Combining Time Series
●
We can use matchers on labels like on() and
ignoring() to whitelist/blacklist labels

●
●
We can specify group sides with group_left and
group_right to tell Prometheus who to
aggregate by

●
●
aggregate by
●
We can reduce label sets to aggregate to only
the labels wanted with by() and without()

●
●
aggregate by
●
We can reduce label sets to aggregate to only
the labels wanted with by() and without()
●
We can match disjointed time series and labels
using label_replace to provide the joining label.

●
https://www.robustperception.io/using-group_lef
t-to-calculate-label-proportions
– demonstrates group_left with ignoring() and
then without() to reduce labels twice
●
https://www.robustperception.io/how-to-have-la
bels-for-machine-roles
– demonstrates group_left with on() and then
by() to reduce labels twice
●
https://www.robustperception.io/understanding-
machine-cpu-usage
– demonstrates using by() to reduce labels

Performance |
Recording Rules
●
https://prometheus.io/docs/practices/rules/
●
https://www.robustperception.io/relabelling-can-
discard-targets-timeseries-and-alerts
●
https://medium.com/quiq-blog/prometheus-rela
beling-tricks-6ae62c56cbda
●
https://www.robustperception.io/extracting-label
s-from-legacy-metric-names
●
https://www.robustperception.io/relabel_configs-
vs-metric_relabel_configs

Performance |
Upstream
●
Prefer changing instrumentation code or
exporter configuration if possible!
●
Consider pruning unused time series by
identifying with outlier queries
– https://www.robustperception.io/which-are-m
y-biggest-metrics

Dashboards
●
Many, smaller dashboards generally load faster
and are less confusing… assuming they are split
out in a useful way.

Linking Charts
●
Chart Drill-down links

Linking Charts
●
●
Dashboard Link charts

Linking Charts
●
●
Dashboard Link charts
●
Text chart

Alerts
●
http://docs.grafana.org/alerting/notifications/
●
https://prometheus.io/docs/alerting/overview/

Alerts
●
http://docs.grafana.org/alerting/notifications/
●
https://prometheus.io/docs/alerting/overview/
●
Grafana supports being a Prometheus scrape
target
– http://docs.grafana.org/administration/metric
s/

Alerts
●
ALERTS{alertname="<alert name>",
alertstate="pending|firing"}
●
http://docs.grafana.org/reference/annotations/
●
http://docs.grafana.org/features/datasources/pr
ometheus/#annotations
●
https://prometheus.io/docs/prometheus/latest/c
onfiguration/alerting_rules/#inspecting-alerts-d
uring-runtime

Effective Alerts
●
How can you aid the person who needs to take
action in taking action?

Effective Alerts
●
How can you aid the person who needs to take
action in taking action?
●
Think of what the person getting the
notification will (have to) do.

Effective Alerts
●
Prometheus supports powerful Go
templating
– https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.alertname }}
●
https://prometheus.io/docs/alerting/notification_
examples/
●
https://prometheus.io/docs/prometheus/latest/c
onfiguration/template_examples/

Open Source
●
http://docs.grafana.org/plugins/developing/deve
lopment/
●
http://docs.grafana.org/reference/export_import
/
●
https://prometheus.io/docs/operating/integratio
ns/

Thanks!
●
If you have any questions or would like to reach
out:
●
My name is Jasmine Hegman
– jasmine@jhegman.com
– http://twitter.com/hegpetz
– https://www.linkedin.com/in/jasminehegman

DevOpsDays Phoenix 2018: Using Prometheus and Grafana for Effective Service Dashboards

Recommended

Recommended

More Related Content

Similar to DevOpsDays Phoenix 2018: Using Prometheus and Grafana for Effective Service Dashboards

Similar to DevOpsDays Phoenix 2018: Using Prometheus and Grafana for Effective Service Dashboards (20)

Recently uploaded

Recently uploaded (20)

DevOpsDays Phoenix 2018: Using Prometheus and Grafana for Effective Service Dashboards