SlideShare a Scribd company logo
Techniques for
monitoring
Microservices
William Brander
@williambza
Particular Software
An average production system
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?
What are you actually monitoring?
Business
Capability
Application
Infrastructure
Are my servers running?Is my application process running?Can users place an order?
Monitoring Area
Monitoring Concerns
Capacity
Performance
Health
Is the server up?Is there high CPU?Do I have enough disk space?
Is my application generating exceptions?
How quickly is my system processing messages?
Can I handle month end batch jobs?
Is the server up?
Is there high CPU?
Do I have enough disk space?
Application
Infrastructure
Can users access the checkout cart?
Are we meeting our SLAs?
What is the impact of adding another customer?
Business
Capability
Interaction Type
Proactive
Reactive
Passive
The monitoring system can display
metrics
The monitoring system alerts me when
something happens
The monitoring system automatically
takes actions to repair the system
A Monitoring Philosophy
Business
Capability
Application
Infrastructure
Capacity
Performance
Health
Monitoring Area Monitoring Concern
Proactive
Reactive
Passive
Interaction Type
Recap: What are we monitoring?
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?
Infrastructure PassiveHealth
27 28 29 30 31 32 33 34 35 37 40 41 42 43 45
Recap: What are we monitoring?
• Warn me with the queue
length exceeds 50
Infrastructure ReactivePerformance
A Monitoring Philosophy
Business
Capability
Application
Infrastructure
Capacity
Performance
Health
Monitoring Area Monitoring Concern
Proactive
Reactive
Passive
Interaction Type
What happens when we
distribute the systems?
Going Distributed
EmailPDF
CRM
Let’s look at queue length
Queue Length
• Queue length is an indicator of work still outstanding
• High queue length doesn’t necessarily indicate a problem though
Stable or
decreasing
is good
Increasing
is bad
Infrastructure Performance
Processing Time
⏱️
⌛✔
Processing Time
• Processing Time is the time taken to successfully process a message
• Processing Time does not include error handling time
• It is independent of queue wait time
Stable or decreasing could
be good
Increasing is bad
PerformanceApplication
✔⌛
⏱️
Critical time
⏱️
Critical time = The entire time taken to process a
message successfully
• Critical Time is the total duration between when a message is created
to when it is processed
Critical Time = Time in Queue +
Processing Time +
Retry Time +
Network Latency Time
Critical Time
Stable or decreasing could
be good
Increasing is bad
Putting these together
• Each of these metrics presents a piece of the puzzle
• Look at them from an endpoint’s perspective, not per message
• Looking at them together gives great insight into your system
Critical Time Processing Time Queue LengthCritical Time Processing Time Queue LengthCritical Time Processing Time Queue Length
Detecting Connectivity
• Distributed systems typically work when other parts aren’t available
• How do you know the endpoint you’re sending messages to is
actually processing messages?
Detecting Connectivity
Peer-to-peer connectivity tells us if an endpoint is
actually processing messages from another
How do we collect all this info?
⏱️
• Processing Time
• Critical Time
• Queue Length
• Connectivity
• Reporting Metric
• Message Type
• Timestamp
• Value
• Reporting Metric (N bytes)
• Message Type (N bytes)
• Timestamp (8 bytes)
• Value (8 bytes)
How do we collect all this info?
• Epoch time (8 bytes)
• Dictionary of Metric Types (n* (N + 4) bytes)
• Dictionary of Message Types (n * (N + 4) bytes)
• An array of:
• Reporting Metric index (4 bytes)
• Message Type index (4 bytes)
• Epoch offset (4 bytes)
• Value (8 bytes)
Getting all the data
Getting all the data
Getting all the data
Techniques for
monitoring
Microservices
William Brander
@williambza
Particular Software

More Related Content

What's hot

C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...DataStax Academy
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About PerformanceTheo Schlossnagle
 
Performance is a Shape, Not a Number
Performance is a Shape, Not a NumberPerformance is a Shape, Not a Number
Performance is a Shape, Not a NumberDevOps.com
 
Donatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsDonatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsTanya Denisyuk
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 
Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...
Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...
Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...DataStax Academy
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionTammy Everts
 
Webisite globalization Clay Tablet
Webisite globalization Clay TabletWebisite globalization Clay Tablet
Webisite globalization Clay TabletRDC
 
Performance Forensics - Understanding Application Performance
Performance Forensics - Understanding Application PerformancePerformance Forensics - Understanding Application Performance
Performance Forensics - Understanding Application PerformanceAlois Reitbauer
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)Brian Brazil
 

What's hot (10)

C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
Performance is a Shape, Not a Number
Performance is a Shape, Not a NumberPerformance is a Shape, Not a Number
Performance is a Shape, Not a Number
 
Donatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsDonatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIs
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...
Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...
Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversion
 
Webisite globalization Clay Tablet
Webisite globalization Clay TabletWebisite globalization Clay Tablet
Webisite globalization Clay Tablet
 
Performance Forensics - Understanding Application Performance
Performance Forensics - Understanding Application PerformancePerformance Forensics - Understanding Application Performance
Performance Forensics - Understanding Application Performance
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
 

Similar to Monitoring microservices

What to consider when monitoring microservices
What to consider when monitoring microservicesWhat to consider when monitoring microservices
What to consider when monitoring microservicesParticular Software
 
Monitoring and Managing Java Applications
Monitoring and Managing Java ApplicationsMonitoring and Managing Java Applications
Monitoring and Managing Java ApplicationsAlois Reitbauer
 
JUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsJUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsBert Jan Schrijver
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoringAndrew White
 
Kanban to #003 - Metrics
Kanban to #003 - MetricsKanban to #003 - Metrics
Kanban to #003 - MetricsFernando Cuenca
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application DesignGlobalLogic Ukraine
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsBert Jan Schrijver
 
Designing distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBusDesigning distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBusMauro Servienti
 
Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsBert Jan Schrijver
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application DesignOrkhan Gasimov
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsBert Jan Schrijver
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsBert Jan Schrijver
 
How Can Monitoring Save Your Bacon - build stuff 2018
How Can Monitoring Save Your Bacon - build stuff 2018How Can Monitoring Save Your Bacon - build stuff 2018
How Can Monitoring Save Your Bacon - build stuff 2018Sean Farmar
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesPeter Varhol
 

Similar to Monitoring microservices (20)

What to consider when monitoring microservices
What to consider when monitoring microservicesWhat to consider when monitoring microservices
What to consider when monitoring microservices
 
Monitoring and Managing Java Applications
Monitoring and Managing Java ApplicationsMonitoring and Managing Java Applications
Monitoring and Managing Java Applications
 
JUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsJUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systems
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Kanban to #003 - Metrics
Kanban to #003 - MetricsKanban to #003 - Metrics
Kanban to #003 - Metrics
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systems
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systems
 
Designing distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBusDesigning distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBus
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systems
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
See through software
See through softwareSee through software
See through software
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systems
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systems
 
How Can Monitoring Save Your Bacon - build stuff 2018
How Can Monitoring Save Your Bacon - build stuff 2018How Can Monitoring Save Your Bacon - build stuff 2018
How Can Monitoring Save Your Bacon - build stuff 2018
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps Practices
 

Recently uploaded

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdfAhmedHussein950959
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfPipe Restoration Solutions
 
shape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxshape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxVishalDeshpande27
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
 
Antenna efficency lecture course chapter 3.pdf
Antenna  efficency lecture course chapter 3.pdfAntenna  efficency lecture course chapter 3.pdf
Antenna efficency lecture course chapter 3.pdfAbrahamGadissa
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdfKamal Acharya
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerapareshmondalnita
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdfKamal Acharya
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdfKamal Acharya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdfKamal Acharya
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptssuser9bd3ba
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfKamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdfPratik Pawar
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf884710SadaqatAli
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 

Recently uploaded (20)

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
shape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxshape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptx
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Antenna efficency lecture course chapter 3.pdf
Antenna  efficency lecture course chapter 3.pdfAntenna  efficency lecture course chapter 3.pdf
Antenna efficency lecture course chapter 3.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 

Monitoring microservices