SlideShare a Scribd company logo
The RED Method
Patterns for instrumentation & monitoring.
Tom Wilkie
tom@grafana.com @tom_wilkie
github.com/tomwilkie
Introduction
Why does this matter?

The USE Method
Utilisation, Saturation, Errors

The RED Method
Requests Rate, Errors, Duration..

The Four Golden Signals
RED + Saturation
Introduction
The USE Method
For every resource, monitor:

• Utilisation: % time that the resource was busy

• Saturation: amount of work resource has to do, often queue length

• Errors: the count of error events
http://www.brendangregg.com/usemethod.html
Utilisation Saturation Errors
CPU ✔ ✔ ✔
Memory ✔ ✔ ✔
Disk ✔ ✔ ✔
Network ✔ ✔ ✖
CPU USE in Prometheus
CPU Utilisation:
1 - avg(rate(node_cpu{job="default/node-exporter",mode="idle"}[1m]))
CPU Saturation:
sum(node_load1{job="default/node-exporter"})
/
sum(node:node_num_cpu:sum)
Memory USE in Prometheus
Memory Utilisation:
1 - sum(
node_memory_MemFree{job=“…”} +
node_memory_Cached{job=“…”} +
node_memory_Buffers{job=“…”}
)
/ sum(node_memory_MemTotal{job=“…”})
Memory Saturation:
1e3 * sum(
rate(node_vmstat_pgpgin{job=“…”}[1m]) +
rate(node_vmstat_pgpgout{job=“…”}[1m]))
)
Interesting / Hard Cases
• CPU Errors, Memory Errors

• Hard Disk Errors!

• Disk Capacity vs Disk IO

• Network Utilisation

• Interconnects
Demo
Time
More Details
• “The USE Method” - Brendan Gregg

• Kubernetes Mixin - https://github.com/grafana/jsonnet-libs
The RED Method
For every service, monitor request:

• Rate - number of requests per second

• Errors - the number of those requests that are failing

• Duration - the amount of time those requests take
The RED Method
Prometheus Implementation
import (
“github.com/prometheus/client_golang/prometheus"
)
var requestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "request_duration_seconds",
Help: "Time (in seconds) spent serving HTTP requests.",
Buckets: prometheus.DefBuckets,
}, []string{"method", "route", "status_code"})
func init() {
prometheus.MustRegister(requestDuration)
}
func wrap(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
m := httpsnoop.CaptureMetrics(h, w, r)
requestDuration.WithLabelValues(r.Method, r.URL.Path,
strconv.Itoa(m.Code)).Observe(m.Duration.Seconds())
})
}
func server(addr string) {
http.Handle("/metrics", prometheus.Handler())
http.Handle("/greeter", wrap(http.HandlerFunc(func(w
http.ResponseWriter, r *http.Request) {
...
}))
}
Prometheus Implementation
Easy to query
Rate:
sum(rate(request_duration_seconds_count{job=“…”}[1m]))
Errors:
sum(rate(request_duration_seconds_count{job=“…”, 
status_code!~”2..”}[1m]))
Duration:
histogram_quantile(0.99,
 sum(rate(request_duration_seconds_bucket{job=“…}[1m])) by (le))
Demo
Time
DAG of Services
Latencies & Averages
More Details
• “Monitoring Microservices” - Weaveworks (slides)

• "The RED Method: key metrics for microservices architecture” - Weaveworks

• “Monitoring and Observability with USE and RED” - VividCortex

• “RED Method for Prometheus – 3 Key Metrics for Monitoring” - Rancher Labs

• “Logs and Metrics” - Cindy Sridharan

• "Logging v. instrumentation”, “Go best practices, six years in” - Peter Bourgon
The Four
Golden Signals
The Four Golden Signals
For each service, monitor:

• Latency - time taken to serve a request
• Traffic - how much demand is places on your system
• Errors - rate or requests that are failing
• Saturation - how “full” your services is
Demo
Time
More Details
• “The Four Golden Signals” - The Google SRE Book

• “How to Monitor the SRE Golden Signals” - Steve Mushero
Summary
Thanks!
Questions?
Tom Wilkie VP Product, Grafana Labs
Previously: Kausal, Weaveworks, Google, Acunu, Xensource
Twitter: @tom_wilkie Email: tom@grafana.com
+
Grafana Cloud is a hosted and fully managed SaaS metrics
platform that helps Ops and Dev teams using Grafana
to understand the behavior of their applications and
infrastructure
Grafana Cloud allows users to provision and manage
the best open source observability tools - Grafana and
Prometheus - all through a simple UI and single API.
What is Grafana Cloud?
Store, visualize and alert without the headache of scaling or managing
your own monitoring stack.
Your complete, fully managed, hosted metrics platform.
Grafana Cloud:

More Related Content

What's hot

SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
Dr Ganesh Iyer
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
Sebastian Poxhofer
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
Dhrubaji Mandal ♛
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
Siglos
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
Kevin Brockhoff
 
Observability and its application
Observability and its applicationObservability and its application
Observability and its application
Thao Huynh Quang
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
Siddharth Teotia
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
Kee Hoon Lee
 
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
충섭 김
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
Splunk
 
Gatekeeper: API gateway
Gatekeeper: API gatewayGatekeeper: API gateway
Gatekeeper: API gateway
ChengHui Weng
 
Open Policy Agent
Open Policy AgentOpen Policy Agent
Open Policy Agent
Torin Sandall
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
Fabian Reinartz
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Ververica
 

What's hot (20)

SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Observability and its application
Observability and its applicationObservability and its application
Observability and its application
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
 
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
 
Gatekeeper: API gateway
Gatekeeper: API gatewayGatekeeper: API gateway
Gatekeeper: API gateway
 
Open Policy Agent
Open Policy AgentOpen Policy Agent
Open Policy Agent
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 

Similar to The RED Method: How to monitoring your microservices.

The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
Kausal
 
The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
Kausal
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
Fastly
 
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESTHE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
InfluxData
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
Amit Kejriwal
 
Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup) Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup)
Arseny Chernov
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance Monitoring
Cliff Crocker
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
mkherlakian
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
rschuppe
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance Monitoring
Akamai Technologies
 
Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019
Bleemeo
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
David Giard
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and Results
NGINX, Inc.
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
Monitor some of the things
Monitor some of the thingsMonitor some of the things
Monitor some of the things
andertech
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
Rodrigo Campos
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
Noriaki Tatsumi
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
Fastly
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
Brendan Gregg
 

Similar to The RED Method: How to monitoring your microservices. (20)

The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
 
The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESTHE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup) Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup)
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance Monitoring
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance Monitoring
 
Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and Results
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Monitor some of the things
Monitor some of the thingsMonitor some of the things
Monitor some of the things
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 

More from Grafana Labs

Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
Grafana Labs
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
Grafana Labs
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
Grafana Labs
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Prometheus Monitoring Mixins
Prometheus Monitoring MixinsPrometheus Monitoring Mixins
Prometheus Monitoring Mixins
Grafana Labs
 

More from Grafana Labs (6)

Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Prometheus Monitoring Mixins
Prometheus Monitoring MixinsPrometheus Monitoring Mixins
Prometheus Monitoring Mixins
 

Recently uploaded

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

The RED Method: How to monitoring your microservices.

  • 1. The RED Method Patterns for instrumentation & monitoring. Tom Wilkie tom@grafana.com @tom_wilkie github.com/tomwilkie
  • 2.
  • 3. Introduction Why does this matter? The USE Method Utilisation, Saturation, Errors The RED Method Requests Rate, Errors, Duration.. The Four Golden Signals RED + Saturation
  • 6. For every resource, monitor: • Utilisation: % time that the resource was busy • Saturation: amount of work resource has to do, often queue length • Errors: the count of error events http://www.brendangregg.com/usemethod.html
  • 7. Utilisation Saturation Errors CPU ✔ ✔ ✔ Memory ✔ ✔ ✔ Disk ✔ ✔ ✔ Network ✔ ✔ ✖
  • 8. CPU USE in Prometheus CPU Utilisation: 1 - avg(rate(node_cpu{job="default/node-exporter",mode="idle"}[1m])) CPU Saturation: sum(node_load1{job="default/node-exporter"}) / sum(node:node_num_cpu:sum)
  • 9. Memory USE in Prometheus Memory Utilisation: 1 - sum( node_memory_MemFree{job=“…”} + node_memory_Cached{job=“…”} + node_memory_Buffers{job=“…”} ) / sum(node_memory_MemTotal{job=“…”}) Memory Saturation: 1e3 * sum( rate(node_vmstat_pgpgin{job=“…”}[1m]) + rate(node_vmstat_pgpgout{job=“…”}[1m])) )
  • 10. Interesting / Hard Cases • CPU Errors, Memory Errors • Hard Disk Errors! • Disk Capacity vs Disk IO • Network Utilisation • Interconnects
  • 12. More Details • “The USE Method” - Brendan Gregg • Kubernetes Mixin - https://github.com/grafana/jsonnet-libs
  • 14. For every service, monitor request: • Rate - number of requests per second • Errors - the number of those requests that are failing • Duration - the amount of time those requests take The RED Method
  • 15.
  • 16. Prometheus Implementation import ( “github.com/prometheus/client_golang/prometheus" ) var requestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{ Name: "request_duration_seconds", Help: "Time (in seconds) spent serving HTTP requests.", Buckets: prometheus.DefBuckets, }, []string{"method", "route", "status_code"}) func init() { prometheus.MustRegister(requestDuration) }
  • 17. func wrap(h http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { m := httpsnoop.CaptureMetrics(h, w, r) requestDuration.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(m.Code)).Observe(m.Duration.Seconds()) }) } func server(addr string) { http.Handle("/metrics", prometheus.Handler()) http.Handle("/greeter", wrap(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { ... })) } Prometheus Implementation
  • 21.
  • 23. More Details • “Monitoring Microservices” - Weaveworks (slides) • "The RED Method: key metrics for microservices architecture” - Weaveworks • “Monitoring and Observability with USE and RED” - VividCortex • “RED Method for Prometheus – 3 Key Metrics for Monitoring” - Rancher Labs • “Logs and Metrics” - Cindy Sridharan • "Logging v. instrumentation”, “Go best practices, six years in” - Peter Bourgon
  • 25.
  • 26. The Four Golden Signals For each service, monitor: • Latency - time taken to serve a request • Traffic - how much demand is places on your system • Errors - rate or requests that are failing • Saturation - how “full” your services is
  • 27.
  • 29. More Details • “The Four Golden Signals” - The Google SRE Book • “How to Monitor the SRE Golden Signals” - Steve Mushero
  • 32. Tom Wilkie VP Product, Grafana Labs Previously: Kausal, Weaveworks, Google, Acunu, Xensource Twitter: @tom_wilkie Email: tom@grafana.com
  • 33. + Grafana Cloud is a hosted and fully managed SaaS metrics platform that helps Ops and Dev teams using Grafana to understand the behavior of their applications and infrastructure Grafana Cloud allows users to provision and manage the best open source observability tools - Grafana and Prometheus - all through a simple UI and single API. What is Grafana Cloud? Store, visualize and alert without the headache of scaling or managing your own monitoring stack. Your complete, fully managed, hosted metrics platform. Grafana Cloud: