Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cloud Native Monitoring
with
Prometheus & Grafana
April 26th, 2019 – Dublin
@PierreVincent pvincent.io
@PierreVincent
Reaching production is
only the beginning
Pierre Vincent
Infrastructure & Reliability Manager
@PierreVincent
pvincent.io
@PierreVincent
Workshop Overview
Slides - Metrics & Prometheus basics
Part 1 - Intro to Prometheus UI and Queries
Part 2 -...
@PierreVincent
System
metrics
Application
metrics
Business
metrics
CPU usage Error rates Customer conversions
Metrics
@PierreVincent
“Cloud Native” changes the game
Monolithic architectures
Long-running instances
Long-running servers
Loosel...
@PierreVincent
Servers / VMs
Appliances/Infra
Services
/metrics
/metrics
/metrics
Prometheus
Overview
@PierreVincent
Scraping for samples
User
Service
/metrics
# HELP http_requests_total Total number of http requests
by resp...
@PierreVincent
Our example
http-simulator
/metrics
http_requests_total
http_request_duration_milliseconds
+ standard go me...
http://grafana.prom-workshop.pvincent.io
PierreVincent/prometheus-workshop
http://prometheus.prom-workshop.pvincent.io
@PierreVincent
Exercises 1 - Counters & Rates
● What's the overall request rate (with a 1 minute rolling-window) for the h...
@PierreVincent
Exercises 2 - Latency distribution
● What is the median latency of all requests to the http-simulator servi...
@PierreVincent
Exercises 3 - Grafana widgets
Some examples of widgets (or come up with your own ones):
● Graph of latency ...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1

Share

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

Download to read offline

Note that provided environments will not be available outside the workshop - you can follow instructions from https://github.com/PierreVincent/prometheus-workshop to run the environment yourself.

In the world of cloud native and distributed applications, Prometheus has quickly risen to be one of the leading open-source monitoring tools. In this workshop, you will get to learn as much as possible to get you started with Prometheus for monitoring a service-oriented architecture.

We will cover:
- The core concepts of Prometheus
- Instrumenting your code to expose metrics
- Querying Prometheus to gain insights on how your applications behave
- Defining rules to trigger alerts based on metrics and thresholds
- Building Grafana dashboards combining multiple metrics

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

  1. 1. Cloud Native Monitoring with Prometheus & Grafana April 26th, 2019 – Dublin @PierreVincent pvincent.io
  2. 2. @PierreVincent Reaching production is only the beginning
  3. 3. Pierre Vincent Infrastructure & Reliability Manager @PierreVincent pvincent.io
  4. 4. @PierreVincent Workshop Overview Slides - Metrics & Prometheus basics Part 1 - Intro to Prometheus UI and Queries Part 2 - Building Grafana Dashboards Part 3 - Creating Prometheus Alerts Part 4 - Instrumenting Code (Golang)
  5. 5. @PierreVincent System metrics Application metrics Business metrics CPU usage Error rates Customer conversions Metrics
  6. 6. @PierreVincent “Cloud Native” changes the game Monolithic architectures Long-running instances Long-running servers Loosely-coupled architectures Short-lived instances Short/Medium-lived servers Microservices Auto-scaling deployments Multiple deploys/day Cloud VMsAuto-scaling clusters SOA
  7. 7. @PierreVincent Servers / VMs Appliances/Infra Services /metrics /metrics /metrics Prometheus Overview
  8. 8. @PierreVincent Scraping for samples User Service /metrics # HELP http_requests_total Total number of http requests by response status code # TYPE http_requests_total counter http_requests_total{endpoint="/login",status="200"} 1584 http_requests_total{endpoint="/login",status="500"} 9 ... metric http_requests_total labels endpoint=/login status=200 timestamp 1519205931 value 1584 tsdb Each value results in a sample Every scrape interval Persist
  9. 9. @PierreVincent Our example http-simulator /metrics http_requests_total http_request_duration_milliseconds + standard go metrics Option 1: Deploy on your own cluster See instructions in kubernetes/install Option 2: Use pre-deployed setup prometheus.prom-workshop.pvincent.io grafana.prom-workshop.pvincent.io OR
  10. 10. http://grafana.prom-workshop.pvincent.io PierreVincent/prometheus-workshop http://prometheus.prom-workshop.pvincent.io
  11. 11. @PierreVincent Exercises 1 - Counters & Rates ● What's the overall request rate (with a 1 minute rolling-window) for the http- simulator service? ● How many requests per minute are errors? ● What's the error rate (in %) of requests to the /users endpoint? sum(rate(http_requests_total{app="http-simulator"}[1m])) 60*sum(rate(http_requests_total{app="http-simulator", status="500"}[1m])) 100 * sum(rate(http_requests_total{app="http-simulator", endpoint="/users", status="500"}[1m])) / sum(rate(http_requests_total{app="http-simulator", endpoint="/users"}[1m]))
  12. 12. @PierreVincent Exercises 2 - Latency distribution ● What is the median latency of all requests to the http-simulator service? ● Does the /users endpoint fulfill the SLO of 3 Nines requests responding within 400ms? histogram_quantile(0.5,rate(http_request_duration_milliseconds_ bucket{app="http-simulator"}[5m])) sum(http_request_duration_milliseconds_bucket{app="http- simulator", status="200", endpoint="/users", le="400"}) / sum(http_request_duration_milliseconds_count{app="http- simulator", status="200", endpoint="/users"})
  13. 13. @PierreVincent Exercises 3 - Grafana widgets Some examples of widgets (or come up with your own ones): ● Graph of latency distribution ● Cumulative % graph of endpoint request rate ● Memory usage over time ● CPU usage over time ● Graph % of requests fulfilling the SLO of 400ms for /login endpoint ● ...
  • ecassamc

    Sep. 5, 2021

Note that provided environments will not be available outside the workshop - you can follow instructions from https://github.com/PierreVincent/prometheus-workshop to run the environment yourself. In the world of cloud native and distributed applications, Prometheus has quickly risen to be one of the leading open-source monitoring tools. In this workshop, you will get to learn as much as possible to get you started with Prometheus for monitoring a service-oriented architecture. We will cover: - The core concepts of Prometheus - Instrumenting your code to expose metrics - Querying Prometheus to gain insights on how your applications behave - Defining rules to trigger alerts based on metrics and thresholds - Building Grafana dashboards combining multiple metrics

Views

Total views

339

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

9

Shares

0

Comments

0

Likes

1

×