Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What to consider when monitoring microservices

209 views

Published on

William Brander and Sean Farmar show how the monitoring game changes when a system becomes distributed and you start delving into the world of microservices.

Learn:
* Why monitoring changes in distributed systems
* A monitoring philosophy that ensures all bases are covered
* The aspects of monitoring that affect asynchronous messaging systems

Published in: Software
  • Have u ever tried external professional writing services like ⇒ www.HelpWriting.net ⇐ ? I did and I am more than satisfied.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

What to consider when monitoring microservices

  1. 1. http://particular.net What to consider when monitoring microservices Sean Farmar holds the world record for answering the most NServiceBus questions - even more than Udi. With over 20 years of experience, he specializes in providing simple solutions for complex business requirements using NServiceBus and applying SOA principles inspired by Udi Dahan. As a solution architect with Particular Software, the creators of NServiceBus, Sean provides support, training and consulting for customers using NServiceBus and the Particular Platform. A professional geek, William works for Particular Software writing amazing software like NServiceBus. Passionate about the web and security, he is engaged in a sordid love affair with JavaScript, and spends most of his free time trying to convince others of it's beauty and elegance. When not behind his laptop hacking away, this amateur beer enthusiast can often be found playing boardgames or drinking cold-brew coffee.
  2. 2. William Brander http://particular.net Sean Farmar What to consider when monitoring microservices
  3. 3. Agenda • Introduction • A Philosophy on Monitoring • How things change when they’re distributed • Monitoring Metrics • Q & A
  4. 4. An average production system Database • Is the web server up? • Is the database up? • Can the webserver talk to the db?
  5. 5. What are you actually monitoring? Business Capability Application Infrastructure Are my servers running?Is my application process running?Can users access the system?
  6. 6. A Monitoring Philosophy Business Capability Application Infrastructure Capacity Performance Health Monitoring Area Monitoring Concern
  7. 7. Monitoring Concerns Capacity Performance Health Is the server up?Is there high CPU?Do I have enough disk space? Is my application generating exceptions? How quickly is my system processing messages? Can I handle month end batch jobs? Is the server up? Is there high CPU? Do I have enough disk space? Application Infrastructure Can users access the system? Are we meeting our SLAs? What is the impact of adding another customer? Business Capability
  8. 8. A Monitoring Philosophy Business Capability Application Infrastructure Capacity Performance Health Monitoring Area Monitoring Concern Proactive Reactive Passive Interaction Type
  9. 9. An average production system Database • Is the web server up? • Is the database up? • Can the webserver talk to the db? Infrastructure PassiveHealth
  10. 10. Going Distributed UI BL DAL DB Email PDF CRM
  11. 11. Going Distributed EmailPDF CRM SQL
  12. 12. Monitoring distributed systems Multiple processes and servers and queues We want to monitor the time it takes for a message to be processed We need to monitor the message queues
  13. 13. Let’s look at queue length
  14. 14. Queue Length • Queue length is an indicator of work still outstanding • High queue length doesn’t necessarily indicate a problem though Stable or decreasing is good Increasing is bad
  15. 15. Processing Time ✔ ⌛ ⏱️ ⌛ ✔
  16. 16. Processing Time and fault tolerance • Processing Time does not include error handling time • Avoid losing data due to exceptions or temporary connectivity issues • If all else fails, move the message to the error queue
  17. 17. Invoke Exception Diagnostics – Immediate Retries Input Queue Immediate Retries x n Start Delayed Retries Timeout Queue If all retries fail: ✘ ✘
  18. 18. Exception Diagnostics – Delayed Retries Return to timeout queue Invoke Error Queue Retry x timesTimeout Queue If all fails: move to error queue ✘ ✘
  19. 19. Detecting Connectivity • Distributed systems typically work when other parts aren’t available • How do you know the endpoint you’re sending messages to is actually processing messages?
  20. 20. Detecting Connectivity
  21. 21. ✔⌛ ⏱️ Critical time ⏱️ Critical time = The entire time taken to process a message successfully ⏱️
  22. 22. • Critical Time is the total duration between when a message is created to when it is processed Critical Time = Time in Queue + Processing Time + Retry Time + Network Time Critical Time Stable or decreasing could be good Increasing is bad
  23. 23. Putting these together • Each of these metrics presents a piece of the puzzle • Look at them from an endpoint’s perspective, not per message • Looking at them together gives great insight into your system Critical Time Processing Time Queue LengthCritical Time Processing Time Queue LengthCritical Time Processing Time Queue Length
  24. 24. Keeping your eye on everything • These 5 metrics can give a lot of insight • Some individual metrics are meaningful • But most tell a story when combined with others • Let the monitoring philosophy guide what you focus on • NServiceBus already provides a lot of these metrics for you! • Letting you focus on monitoring the metrics that impact your business
  25. 25. Learn more • Try NServiceBus + the Particular Service Platform • https://docs.particular.net/tutorials/quickstart/ • Take a look at NServiceBus.Metrics Nuget package • Follow us to find out about the next webinar in the series!
  26. 26. Q&A
  27. 27. Thank you! @farmar sean.farmar@particular.net @williambza william.brander@particular.net https://www.particular.net

×