SlideShare a Scribd company logo
1 of 31
The RED Method
Patterns for instrumentation & monitoring.
Tom Wilkie, London Go Meetup
tom@kausal.co @tom_wilkie
github.com/tomwilkie
• Founder Kausal, “transforming observability”
• Prometheus developer
• Home brewer
Previously:
• Worked on Kubernetes & Prometheus at
Weaveworks
• SRE for Google Analytics
• Founder/CTO at Acunu, worked on Cassandra
Introduction
Why does this matter?
The RED Method
Requests Rate, Errors, Duration..
The USE Method
Utilisation, Saturation, Errors
The Four Golden Signals
RED + Saturation
Introduction
The RED Method
The RED Method
For every service, monitor request:
• Rate - number of requests per second
• Errors - the number of those requests that are failing
• Duration - the amount of time those requests take
Prometheus Implementation
import (
“github.com/prometheus/client_golang/prometheus"
)
var requestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "request_duration_seconds",
Help: "Time (in seconds) spent serving HTTP requests.",
Buckets: prometheus.DefBuckets,
}, []string{"method", "route", "status_code"})
func init() {
prometheus.MustRegister(requestDuration)
}
Prometheus Implementation
func wrap(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
m := httpsnoop.CaptureMetrics(h, w, r)
requestDuration.WithLabelValues(r.Method, r.URL.Path,
strconv.Itoa(m.Code)).Observe(m.Duration.Seconds())
})
}
func server(addr string) {
http.Handle("/metrics", prometheus.Handler())
http.Handle("/greeter", wrap(http.HandlerFunc(func(w http.ResponseWriter, r *h
...
}))
}
Easy to query
Rate:
sum(rate(request_duration_seconds_count{job=“…”}[1m]))
Errors:
sum(rate(request_duration_seconds_count{job=“…”, 
status_code!~”2..”}[1m]))
Duration:
histogram_quantile(0.99,
 sum(rate(request_duration_seconds_bucket{job=“…}[1m])) by (le))
Demo
Time
DAG of Services
Latencies & Averages
More Details
• “Monitoring Microservices” - Weaveworks (slides)
• "The RED Method: key metrics for microservices architecture” - Weaveworks
• “Monitoring and Observability with USE and RED” - VividCortex
• “RED Method for Prometheus – 3 Key Metrics for Monitoring” - Rancher Labs
• “Logs and Metrics” - Cindy Sridharan
• "Logging v. instrumentation”, “Go best practices, six years in” - Peter Bourgon
The USE Method
For every resource, monitor:
• Utilisation: % time that the resource was busy
• Saturation: amount of work resource has to do, often queue length
• Errors: the count of error events
USE Methodhttp://www.brendangregg.com/usemethod.html
Utilisation Saturation Errors
CPU
✔ ✔ ✔
Memory
✔ ✔ ✔
Disk
✔ ✔ ✔
Network
✔ ✔ ✖
Interesting / Hard Cases
• CPU Errors, Memory Errors
• Hard Disk Errors!
• Disk Capacity vs Disk IO
• Network Utilisation
• Interconnects
Demo
Time
More Details
• “The USE Method” - Brendan Gregg
The Four Golden Signals
The Four Golden Signals
For each service, monitor:
• Latency - time taken to serve a request
• Traffic - how much demand is places on your system
• Errors - rate or requests that are failing
• Saturation - how “full” your services is
Demo
Time
More Details
• “The Four Golden Signals” - The Google SRE Book
• “How to Monitor the SRE Golden Signals” - Steve Mushero
Summary
Future Work
• “Method R”
• “TSA Method”
Thanks!
Questions?

More Related Content

What's hot

2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python
Adam Hitchcock
 

What's hot (20)

Altitude NY 2018: 132 websites, 1 service: Your local news runs on Fastly
Altitude NY 2018: 132 websites, 1 service: Your local news runs on FastlyAltitude NY 2018: 132 websites, 1 service: Your local news runs on Fastly
Altitude NY 2018: 132 websites, 1 service: Your local news runs on Fastly
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to Prometheus
 
Web Real-time Communications
Web Real-time CommunicationsWeb Real-time Communications
Web Real-time Communications
 
Monitoring Cloud Native Applications with Prometheus
Monitoring Cloud Native Applications with PrometheusMonitoring Cloud Native Applications with Prometheus
Monitoring Cloud Native Applications with Prometheus
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
 
Nginx monitoring with graphite
Nginx monitoring with graphiteNginx monitoring with graphite
Nginx monitoring with graphite
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012
 
2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python
 
Project Reactor By Example
Project Reactor By ExampleProject Reactor By Example
Project Reactor By Example
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshop
 
Prezo at-mesos con2015-final
Prezo at-mesos con2015-finalPrezo at-mesos con2015-final
Prezo at-mesos con2015-final
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 

Similar to The RED Method: How To Instrument Your Services

Comet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardComet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forward
NOLOH LLC.
 

Similar to The RED Method: How To Instrument Your Services (20)

The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup) Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup)
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance Monitoring
 
Comet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardComet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forward
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debugging
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance Monitoring
 
PyCon HK 2015 - Monitoring the performance of python web applications
PyCon HK 2015 -  Monitoring the performance of python web applicationsPyCon HK 2015 -  Monitoring the performance of python web applications
PyCon HK 2015 - Monitoring the performance of python web applications
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
Building Awesome APIs with Lumen
Building Awesome APIs with LumenBuilding Awesome APIs with Lumen
Building Awesome APIs with Lumen
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
The 3 Top Techniques for Web Security Testing Using a Proxy
The 3 Top Techniques for Web Security Testing Using a ProxyThe 3 Top Techniques for Web Security Testing Using a Proxy
The 3 Top Techniques for Web Security Testing Using a Proxy
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and Results
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

The RED Method: How To Instrument Your Services