Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Application Metrics
with Prometheus examples
Rafael Dohms @rdohms Backend Architect @
Application Metrics
with Prometheus examples
Rafael Dohms @rdohms Backend Architect @
How do you do
metrics?
“The Prometheus 

Scientist Method”
I hope not.
jobs.usabilla.com
Rafael Dohms
Staff Engineer
rdohmsdoh.ms
FeedbackFeedback
jobs.usabilla.com
Rafael Dohms
Staff Engineer
rdohmsdoh.ms
We are hiring!

jobs.usabilla.com
Let’s talk about metrics. 



But let’s do it with a
concrete example.
Kafka / DDD / Autonomous Microservices / Monitoring
Kafka / DDD / Autonomous Microservices / Monitoring
Kafka / DDD / Autonomous Microservices / Monitoring
Metrics are insights into
the current state of your
application.
Metrics tell you if your
service is healthy.
Metrics tell you what
is wrong.
Metrics tell you what
is right.
Metrics tell you what
will soon be wrong.
Metrics tell you where
to start looking.
Site Reliability Engineering
SLIs SLOs
◎
SLAs
SLIs
Service Level Indicators
“A quantitative measure of some
aspect of your application”
The response time of a request w...
SLOs
◎
Service Level Objectives
“A target value or a range of values
for something measured by an SLI”
Request response ti...
Help you drive architectural
decisions, like optimisation
SLOs
◎
Response time SLO: 150 ms

95th Percentile of Processing ...
SLAs
Service Level Agreements
“An explicit or implicit contract with
your customer,that includes
consequences of missing t...
Measuring
–Etsy Engineering
“If it moves, we track it.”
https://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Metrics
Statistics
What is happening right
now?
How often does this happen?
Telemetry
Telemetry
“the process of recording and transmitting the readings of an instrument”
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
I really miss Ayrton Senna
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quanti...
Statistics
Incoming feedback items
with origin information
Telemetry
response time of public
endpoints
“If it moves, we track it.”
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Incoming Data
Peak fr...
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Incoming Data
Peak fr...
Metrics,Everywhere.
SLIs
Picking good SLIs
SLIs may change
according to who is
looking at the data.
Understanding the
nature of your system
User-Facing 

serving system?
availability,throughput,latency
Storage System?
availability,durability,latency
Big Data Systems?
throughput,end-to-end latency
User-Facing and Big Data Systems
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per ...
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per ...
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per ...
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per ...
Picking Targets
Target value
SLI value >= target
Target Range
lower bound <= SLI value <= upper bound
Don’t pick a target based
on current performance
What is the business need?
What are users trying to achieve?
How much imp...
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
“Immediate"
“...
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
“Immediate"
“...
Some, but not too many.
can you settle an argument or priority based on it?
Don’t over achieve.
The Chubby example.
Adapt. Evolve.
re-define SLO’s as your product evolves.
Meeting Expectations.
Attach consequences
to your Objectives.
The night is dark and
full of loopholes.
take a friend from legal with you.
Safety Margins.
like setting the alarm 5 minutes before the meeting.
Metrics in Practice.
prometheus.io
Push Model
scale this!
Pull Model
scale this!
Prometheus
Telemetry Statistics
Prometheus
StatsD,InfluxDB,etc…
+
Long Term Storage
GaugeHistogramCounter Summary
Cumulative
metric the
represents a
single number
that only
increases
Samples and
count of
ob...
jimdo/prometheus_client_php
reads from /metrics
reads from local storage
writes to local storage
your code
/metrics
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adap...
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
...
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
...
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
...
–Also Rafael (today)
“I’ll just try this live demo
again.”
http://localhost:9090/graph http://localhost:8180/metrics
–Rafa...
You can’t act on what
you can’t see.
Metrics without
actionability are just
numbers on a screen.
Act as soon as an 

SLO is threatened .
Thank you.
Drop me some 

feedback at Usabilla 

and make this talk 

better.
@rdohms

http://slides.doh.ms
https://joind....
Thank you.
Drop me some 

feedback at Usabilla 

and make this talk 

better.
@rdohms

http://slides.doh.ms
https://joind....
Application Metrics (with Prometheus examples)
Application Metrics (with Prometheus examples)
Application Metrics (with Prometheus examples)
Upcoming SlideShare
Loading in …5
×

Application Metrics (with Prometheus examples)

249 views

Published on

We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover various Site Reliability Engineering topics, such as SLIs and SLOs while we explore real life examples of defining and implementing metrics in a system with examples using Prometheus, an open-source system monitoring and alert platform, to demonstrate implementation. Let's get back to some real science.

Published in: Technology
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Application Metrics (with Prometheus examples)

  1. 1. Application Metrics with Prometheus examples Rafael Dohms @rdohms Backend Architect @
  2. 2. Application Metrics with Prometheus examples Rafael Dohms @rdohms Backend Architect @
  3. 3. How do you do metrics?
  4. 4. “The Prometheus 
 Scientist Method”
  5. 5. I hope not.
  6. 6. jobs.usabilla.com Rafael Dohms Staff Engineer rdohmsdoh.ms
  7. 7. FeedbackFeedback jobs.usabilla.com Rafael Dohms Staff Engineer rdohmsdoh.ms We are hiring!
 jobs.usabilla.com
  8. 8. Let’s talk about metrics. 
 
 But let’s do it with a concrete example.
  9. 9. Kafka / DDD / Autonomous Microservices / Monitoring
  10. 10. Kafka / DDD / Autonomous Microservices / Monitoring
  11. 11. Kafka / DDD / Autonomous Microservices / Monitoring
  12. 12. Metrics are insights into the current state of your application.
  13. 13. Metrics tell you if your service is healthy.
  14. 14. Metrics tell you what is wrong.
  15. 15. Metrics tell you what is right.
  16. 16. Metrics tell you what will soon be wrong.
  17. 17. Metrics tell you where to start looking.
  18. 18. Site Reliability Engineering
  19. 19. SLIs SLOs ◎ SLAs
  20. 20. SLIs Service Level Indicators “A quantitative measure of some aspect of your application” The response time of a request was 150ms Source: Site Reliability Engineering - O’Reilly
  21. 21. SLOs ◎ Service Level Objectives “A target value or a range of values for something measured by an SLI” Request response times should be below 200ms Source: Site Reliability Engineering - O’Reilly
  22. 22. Help you drive architectural decisions, like optimisation SLOs ◎ Response time SLO: 150 ms
 95th Percentile of Processing time (PHP time): 5ms
 
 As a result we decided to invest more time in exploring the problem domain and not optimising our stack.
  23. 23. SLAs Service Level Agreements “An explicit or implicit contract with your customer,that includes consequences of missing their SLOs” The 99th percentile of requests response times should meet our SLO,or we will refund users Source: Site Reliability Engineering - O’Reilly
  24. 24. Measuring
  25. 25. –Etsy Engineering “If it moves, we track it.” https://codeascraft.com/2011/02/15/measure-anything-measure-everything/
  26. 26. Metrics Statistics What is happening right now? How often does this happen? Telemetry
  27. 27. Telemetry “the process of recording and transmitting the readings of an instrument”
  28. 28. Statistics / Analytics “the practice of collecting and analysing numerical data in large quantities”
  29. 29. Statistics / Analytics “the practice of collecting and analysing numerical data in large quantities”
  30. 30. I really miss Ayrton Senna Statistics / Analytics “the practice of collecting and analysing numerical data in large quantities”
  31. 31. Statistics Incoming feedback items with origin information Telemetry response time of public endpoints
  32. 32. “If it moves, we track it.”
  33. 33. Request Latency System Throughput Error Rate Availability Resource Usage “If it moves, we track it.”
  34. 34. Request Latency System Throughput Error Rate Availability Resource Usage “If it moves, we track it.” Incoming Data Peak frequency CPU Memory Disk Space Bandwith node PHP NginX Database
  35. 35. Request Latency System Throughput Error Rate Availability Resource Usage “If it moves, we track it.” Incoming Data Peak frequency CPU Memory Disk Space Bandwith node PHP NginX Database Measure Monitoring Measure measurements
  36. 36. Metrics,Everywhere.
  37. 37. SLIs
  38. 38. Picking good SLIs
  39. 39. SLIs may change according to who is looking at the data.
  40. 40. Understanding the nature of your system
  41. 41. User-Facing 
 serving system? availability,throughput,latency
  42. 42. Storage System? availability,durability,latency
  43. 43. Big Data Systems? throughput,end-to-end latency
  44. 44. User-Facing and Big Data Systems
  45. 45. ๏SLIs - Response time in the“receive”endpoint - Turn around time,from“receive” to“show”. - Individual processing time per step - Data counting: how many,what nature User-Facing and Big Data Systems
  46. 46. ๏SLIs - Response time in the“receive”endpoint - Turn around time,from“receive” to“show”. - Individual processing time per step - Data counting: how many,what nature User-Facing and Big Data Systems More relevant to development team
  47. 47. ๏SLIs - Response time in the“receive”endpoint - Turn around time,from“receive” to“show”. - Individual processing time per step - Data counting: how many,what nature ๏Other Metrics - node,nginx,php-fpm,java metrics - server metrics: cpu,memory,disk space - Size of cluster - Kafka health User-Facing and Big Data Systems More relevant to development team
  48. 48. ๏SLIs - Response time in the“receive”endpoint - Turn around time,from“receive” to“show”. - Individual processing time per step - Data counting: how many,what nature ๏Other Metrics - node,nginx,php-fpm,java metrics - server metrics: cpu,memory,disk space - Size of cluster - Kafka health User-Facing and Big Data Systems More relevant to development team More relevant to Infrastructure team
  49. 49. Picking Targets
  50. 50. Target value SLI value >= target Target Range lower bound <= SLI value <= upper bound
  51. 51. Don’t pick a target based on current performance What is the business need? What are users trying to achieve? How much impact does it have on the user experience?
  52. 52. How long can it take between the user clicking submit and a confirmation that our servers received the data?
  53. 53. How long can it take between the user clicking submit and a confirmation that our servers received the data? “Immediate" “We sell as real time” “500ms,too much HTML“ “I don’t know”
  54. 54. How long can it take between the user clicking submit and a confirmation that our servers received the data? “Immediate" “We sell as real time” “500ms,too much HTML“ “I don’t know” What is human perception of immediate? 100ms Collection API should respond within 150ms
  55. 55. Some, but not too many. can you settle an argument or priority based on it?
  56. 56. Don’t over achieve. The Chubby example.
  57. 57. Adapt. Evolve. re-define SLO’s as your product evolves.
  58. 58. Meeting Expectations.
  59. 59. Attach consequences to your Objectives.
  60. 60. The night is dark and full of loopholes. take a friend from legal with you.
  61. 61. Safety Margins. like setting the alarm 5 minutes before the meeting.
  62. 62. Metrics in Practice.
  63. 63. prometheus.io
  64. 64. Push Model scale this!
  65. 65. Pull Model scale this!
  66. 66. Prometheus Telemetry Statistics Prometheus StatsD,InfluxDB,etc… + Long Term Storage
  67. 67. GaugeHistogramCounter Summary Cumulative metric the represents a single number that only increases Samples and count of observations over time A counter,that can go up or down Same as a histogram but with stream of quantiles over a sliding window.
  68. 68. jimdo/prometheus_client_php
  69. 69. reads from /metrics reads from local storage writes to local storage your code /metrics
  70. 70. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']);
  71. 71. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']);
  72. 72. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']); APC / APCu Redis
  73. 73. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']); namespace metric name help label names buckets
  74. 74. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']); measurement label values
  75. 75. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']); namespace metric name help labels
  76. 76. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']);
  77. 77. <?php use PrometheusCounter; use PrometheusHistogram; use PrometheusStorageAPC;
 require_once 'vendor/autoload.php'; $adapter = new APC(); $histogram = new Histogram( $adapter, 'my_app', 'response_time_ms', 'This measures ....', ['status', 'url'], [0, 10, 50, 100] ); $histogram->observe(15, ['200', '/url']); $counter = new Counter($adapter, 'my_app', 'count_total', 'How many...', ['status', 'url']); $counter->inc(['200', '/url']); $counter->incBy(5, ['200', '/url']);
  78. 78. <?php use PrometheusRenderTextFormat; use PrometheusStorageAPC; require_once 'vendor/autoload.php'; $adapter = new APC(); $renderer = new RenderTextFormat(); $result = $renderer->render($adapter->collect()); echo $result;
  79. 79. <?php use PrometheusRenderTextFormat; use PrometheusStorageAPC; require_once 'vendor/autoload.php'; $adapter = new APC(); $renderer = new RenderTextFormat(); $result = $renderer->render($adapter->collect()); echo $result;
  80. 80. <?php use PrometheusRenderTextFormat; use PrometheusStorageAPC; require_once 'vendor/autoload.php'; $adapter = new APC(); # HELP my_app_count_total How many... # TYPE my_app_count_total counter my_app_count_total{status="200",url="/url"} 6 # HELP my_app_response_time_ms This measures .... # TYPE my_app_response_time_ms histogram my_app_response_time_ms_bucket{status="200",url="/url",le="0"} 0 my_app_response_time_ms_bucket{status="200",url="/url",le="10"} 0 my_app_response_time_ms_bucket{status="200",url="/url",le="50"} 1 my_app_response_time_ms_bucket{status="200",url="/url",le="100"} 1 my_app_response_time_ms_bucket{status="200",url="/url",le="+Inf"} 1 my_app_response_time_ms_count{status="200",url="/url"} 1 my_app_response_time_ms_sum{status="200",url="/url"} 16 $renderer = new RenderTextFormat(); $result = $renderer->render($adapter->collect()); echo $result;
  81. 81. –Also Rafael (today) “I’ll just try this live demo again.” http://localhost:9090/graph http://localhost:8180/metrics –Rafael (yesterday) “Demos always fail.” http://localhost:8180/index https://github.com/rdohms/talk-app-metrics
  82. 82. You can’t act on what you can’t see.
  83. 83. Metrics without actionability are just numbers on a screen.
  84. 84. Act as soon as an 
 SLO is threatened .
  85. 85. Thank you. Drop me some 
 feedback at Usabilla 
 and make this talk 
 better. @rdohms
 http://slides.doh.ms https://joind.in/talk/bd0c9 we feedback
  86. 86. Thank you. Drop me some 
 feedback at Usabilla 
 and make this talk 
 better. @rdohms
 http://slides.doh.ms https://joind.in/talk/bd0c9 we feedback

×