Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

The RED Method: How to monitoring your microservices.

Download to read offline

The RED Method defines three key metrics you should measure for every microservice in your architecture; inspired by the USE Method from Brendan Gregg, it gives developers a template for instrumenting their services and building dashboards in a consistent, repeatable fashion.

In this talk we will discuss patterns of application instrumentation, where and when they are applicable, and how they can be implemented with Prometheus. We’ll cover Google’s Four Golden Signals, the RED Method, the USE Method, and Dye Testing. We’ll also discuss why consistency is an important approach for reducing cognitive load. Finally we’ll talk about the limitations of these approaches and what can be done to overcome them.

Related Books

Free with a 30 day trial from Scribd

See all

The RED Method: How to monitoring your microservices.

  1. 1. The RED Method Patterns for instrumentation & monitoring. Tom Wilkie tom@grafana.com @tom_wilkie github.com/tomwilkie
  2. 2. Introduction Why does this matter? The USE Method Utilisation, Saturation, Errors The RED Method Requests Rate, Errors, Duration.. The Four Golden Signals RED + Saturation
  3. 3. Introduction
  4. 4. The USE Method
  5. 5. For every resource, monitor: • Utilisation: % time that the resource was busy • Saturation: amount of work resource has to do, often queue length • Errors: the count of error events http://www.brendangregg.com/usemethod.html
  6. 6. Utilisation Saturation Errors CPU ✔ ✔ ✔ Memory ✔ ✔ ✔ Disk ✔ ✔ ✔ Network ✔ ✔ ✖
  7. 7. CPU USE in Prometheus CPU Utilisation: 1 - avg(rate(node_cpu{job="default/node-exporter",mode="idle"}[1m])) CPU Saturation: sum(node_load1{job="default/node-exporter"}) / sum(node:node_num_cpu:sum)
  8. 8. Memory USE in Prometheus Memory Utilisation: 1 - sum( node_memory_MemFree{job=“…”} + node_memory_Cached{job=“…”} + node_memory_Buffers{job=“…”} ) / sum(node_memory_MemTotal{job=“…”}) Memory Saturation: 1e3 * sum( rate(node_vmstat_pgpgin{job=“…”}[1m]) + rate(node_vmstat_pgpgout{job=“…”}[1m])) )
  9. 9. Interesting / Hard Cases • CPU Errors, Memory Errors • Hard Disk Errors! • Disk Capacity vs Disk IO • Network Utilisation • Interconnects
  10. 10. Demo Time
  11. 11. More Details • “The USE Method” - Brendan Gregg • Kubernetes Mixin - https://github.com/grafana/jsonnet-libs
  12. 12. The RED Method
  13. 13. For every service, monitor request: • Rate - number of requests per second • Errors - the number of those requests that are failing • Duration - the amount of time those requests take The RED Method
  14. 14. Prometheus Implementation import ( “github.com/prometheus/client_golang/prometheus" ) var requestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{ Name: "request_duration_seconds", Help: "Time (in seconds) spent serving HTTP requests.", Buckets: prometheus.DefBuckets, }, []string{"method", "route", "status_code"}) func init() { prometheus.MustRegister(requestDuration) }
  15. 15. func wrap(h http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { m := httpsnoop.CaptureMetrics(h, w, r) requestDuration.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(m.Code)).Observe(m.Duration.Seconds()) }) } func server(addr string) { http.Handle("/metrics", prometheus.Handler()) http.Handle("/greeter", wrap(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { ... })) } Prometheus Implementation
  16. 16. Easy to query Rate: sum(rate(request_duration_seconds_count{job=“…”}[1m])) Errors: sum(rate(request_duration_seconds_count{job=“…”,  status_code!~”2..”}[1m])) Duration: histogram_quantile(0.99,  sum(rate(request_duration_seconds_bucket{job=“…}[1m])) by (le))
  17. 17. Demo Time
  18. 18. DAG of Services
  19. 19. Latencies & Averages
  20. 20. More Details • “Monitoring Microservices” - Weaveworks (slides) • "The RED Method: key metrics for microservices architecture” - Weaveworks • “Monitoring and Observability with USE and RED” - VividCortex • “RED Method for Prometheus – 3 Key Metrics for Monitoring” - Rancher Labs • “Logs and Metrics” - Cindy Sridharan • "Logging v. instrumentation”, “Go best practices, six years in” - Peter Bourgon
  21. 21. The Four Golden Signals
  22. 22. The Four Golden Signals For each service, monitor: • Latency - time taken to serve a request • Traffic - how much demand is places on your system • Errors - rate or requests that are failing • Saturation - how “full” your services is
  23. 23. Demo Time
  24. 24. More Details • “The Four Golden Signals” - The Google SRE Book • “How to Monitor the SRE Golden Signals” - Steve Mushero
  25. 25. Summary
  26. 26. Thanks! Questions?
  27. 27. Tom Wilkie VP Product, Grafana Labs Previously: Kausal, Weaveworks, Google, Acunu, Xensource Twitter: @tom_wilkie Email: tom@grafana.com
  28. 28. + Grafana Cloud is a hosted and fully managed SaaS metrics platform that helps Ops and Dev teams using Grafana to understand the behavior of their applications and infrastructure Grafana Cloud allows users to provision and manage the best open source observability tools - Grafana and Prometheus - all through a simple UI and single API. What is Grafana Cloud? Store, visualize and alert without the headache of scaling or managing your own monitoring stack. Your complete, fully managed, hosted metrics platform. Grafana Cloud:
  • IanLi1

    May. 23, 2020

The RED Method defines three key metrics you should measure for every microservice in your architecture; inspired by the USE Method from Brendan Gregg, it gives developers a template for instrumenting their services and building dashboards in a consistent, repeatable fashion. In this talk we will discuss patterns of application instrumentation, where and when they are applicable, and how they can be implemented with Prometheus. We’ll cover Google’s Four Golden Signals, the RED Method, the USE Method, and Dye Testing. We’ll also discuss why consistency is an important approach for reducing cognitive load. Finally we’ll talk about the limitations of these approaches and what can be done to overcome them.

Views

Total views

882

On Slideshare

0

From embeds

0

Number of embeds

12

Actions

Downloads

28

Shares

0

Comments

0

Likes

1

×