Creative Commons Zero https://www.flickr.com/photos/freestocks/25668265836
Access to a lot of components
Range from the frontends to the databases
With 24x7 oncall shifts
In a DevOps world, more data, more awareness
More changes, different scale
How can we keep up??
The DevOps principles: CAMS
(a definition of DevOps)
(Damon Edwards and John Willis, 2010 http://devopsdictionary.com/wiki/CAMS)
This talk is about all of it..
What do we learn?
Predict users habits
Deviation from the norm are not normal
It means that users can not reach us/use our
Why business metrics matter?
Good service depends on: linux health, dns,
network, ntp, disk space, cpu, open files, database,
cache systems, load balancers, partners,
electricity, virtualization stack, nfs, ... and it moves
Customers won't call you because your disk is full!
Given that the End User matters
We have decided to standadize metrics
exchange between partners
Prometheus format used (soon to be
Everyone knows HTTP!
What do we exchange?
We are not interested in partner's internal (and
don't want to expose us)
We are exchanging precomputed metrics (rate
over 5 minutes, duration over 5 minutes),
excluding servers, instances, ...
Identify, in the chain, the bottlenecks and the
We define our dashboards in two parts:
10 graphes on top about the business: RED,
USE, Alerts, data from partners, monitoring
robots, state of the monitoring
hidden by default: Technical Health - ntp, disk,
db, network, jvm, ...
Limited number of graphes
Errors in RED
Attention points in Yellow/Orange
Business monitoring allows yo to know early
when things are wrong
Provides clear asnwers to your customers in
minutes (no more "I will check")
// to make between technical and business
metrics (to find causes)
Is it REALLY fixed?
Until when (technical and business)?
What did I miss? What is the impact?
Because you run queries and alerts from a
You can run queries accross targets/jobs
Detect faulty instances, alert for server X
based on metrics of server Y
Do not underestimate the monitoring of the
development / staging environments.
Business metrics are good candidates
to wake up someone at night.
Pull Based , metrics centrincs
The targets (e.g. developers) choose the
metrics they expose => Empowering people
HTTP permits TLS, Client Auth, ... and cross
org sharing of metrics
Becoming a standard in the industry
Central point for all teams
Show current and pas status
Should give you the opportunity to answer
Focusing on Business Metrics is hard work that
will show benefits accross teams and provide
visibility towards hierarchy, enabling you to gain
trust and move on more quickly towards a DevOps