Successfully reported this slideshow.
Your SlideShare is downloading. ×

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 13 Ad

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

Download to read offline

Note that provided environments will not be available outside the workshop - you can follow instructions from https://github.com/PierreVincent/prometheus-workshop to run the environment yourself.

In the world of cloud native and distributed applications, Prometheus has quickly risen to be one of the leading open-source monitoring tools. In this workshop, you will get to learn as much as possible to get you started with Prometheus for monitoring a service-oriented architecture.

We will cover:
- The core concepts of Prometheus
- Instrumenting your code to expose metrics
- Querying Prometheus to gain insights on how your applications behave
- Defining rules to trigger alerts based on metrics and thresholds
- Building Grafana dashboards combining multiple metrics

Note that provided environments will not be available outside the workshop - you can follow instructions from https://github.com/PierreVincent/prometheus-workshop to run the environment yourself.

In the world of cloud native and distributed applications, Prometheus has quickly risen to be one of the leading open-source monitoring tools. In this workshop, you will get to learn as much as possible to get you started with Prometheus for monitoring a service-oriented architecture.

We will cover:
- The core concepts of Prometheus
- Instrumenting your code to expose metrics
- Querying Prometheus to gain insights on how your applications behave
- Defining rules to trigger alerts based on metrics and thresholds
- Building Grafana dashboards combining multiple metrics

Advertisement
Advertisement

More Related Content

Recently uploaded (20)

Advertisement

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

  1. 1. Cloud Native Monitoring with Prometheus & Grafana April 26th, 2019 – Dublin @PierreVincent pvincent.io
  2. 2. @PierreVincent Reaching production is only the beginning
  3. 3. Pierre Vincent Infrastructure & Reliability Manager @PierreVincent pvincent.io
  4. 4. @PierreVincent Workshop Overview Slides - Metrics & Prometheus basics Part 1 - Intro to Prometheus UI and Queries Part 2 - Building Grafana Dashboards Part 3 - Creating Prometheus Alerts Part 4 - Instrumenting Code (Golang)
  5. 5. @PierreVincent System metrics Application metrics Business metrics CPU usage Error rates Customer conversions Metrics
  6. 6. @PierreVincent “Cloud Native” changes the game Monolithic architectures Long-running instances Long-running servers Loosely-coupled architectures Short-lived instances Short/Medium-lived servers Microservices Auto-scaling deployments Multiple deploys/day Cloud VMsAuto-scaling clusters SOA
  7. 7. @PierreVincent Servers / VMs Appliances/Infra Services /metrics /metrics /metrics Prometheus Overview
  8. 8. @PierreVincent Scraping for samples User Service /metrics # HELP http_requests_total Total number of http requests by response status code # TYPE http_requests_total counter http_requests_total{endpoint="/login",status="200"} 1584 http_requests_total{endpoint="/login",status="500"} 9 ... metric http_requests_total labels endpoint=/login status=200 timestamp 1519205931 value 1584 tsdb Each value results in a sample Every scrape interval Persist
  9. 9. @PierreVincent Our example http-simulator /metrics http_requests_total http_request_duration_milliseconds + standard go metrics Option 1: Deploy on your own cluster See instructions in kubernetes/install Option 2: Use pre-deployed setup prometheus.prom-workshop.pvincent.io grafana.prom-workshop.pvincent.io OR
  10. 10. http://grafana.prom-workshop.pvincent.io PierreVincent/prometheus-workshop http://prometheus.prom-workshop.pvincent.io
  11. 11. @PierreVincent Exercises 1 - Counters & Rates ● What's the overall request rate (with a 1 minute rolling-window) for the http- simulator service? ● How many requests per minute are errors? ● What's the error rate (in %) of requests to the /users endpoint? sum(rate(http_requests_total{app="http-simulator"}[1m])) 60*sum(rate(http_requests_total{app="http-simulator", status="500"}[1m])) 100 * sum(rate(http_requests_total{app="http-simulator", endpoint="/users", status="500"}[1m])) / sum(rate(http_requests_total{app="http-simulator", endpoint="/users"}[1m]))
  12. 12. @PierreVincent Exercises 2 - Latency distribution ● What is the median latency of all requests to the http-simulator service? ● Does the /users endpoint fulfill the SLO of 3 Nines requests responding within 400ms? histogram_quantile(0.5,rate(http_request_duration_milliseconds_ bucket{app="http-simulator"}[5m])) sum(http_request_duration_milliseconds_bucket{app="http- simulator", status="200", endpoint="/users", le="400"}) / sum(http_request_duration_milliseconds_count{app="http- simulator", status="200", endpoint="/users"})
  13. 13. @PierreVincent Exercises 3 - Grafana widgets Some examples of widgets (or come up with your own ones): ● Graph of latency distribution ● Cumulative % graph of endpoint request rate ● Memory usage over time ● CPU usage over time ● Graph % of requests fulfilling the SLO of 400ms for /login endpoint ● ...

×