2. An average production system
@farmar2
Databas
e
• Is the web server up?
• Is the database up?
• Can the webserver
talk to the db?
3. What are you actually monitoring?
@farmar3
Business
Capability
Application
Infrastructure
Are my servers
running?
Is my application
process running?
Can users access
the system?
4. What are you actually monitoring?
@farmar4
Business
Capability
Application
Infrastructure
Is my application generating exceptions?
How quickly is my system processing messages?
Can I handle month end batch jobs?
Is the server up?
Is there high CPU?
Do I have enough disk space?
Application
Infrastructure
Can users access the system?
Are we meeting our SLAs?
What is the impact of adding another customer?
Business
Capability
8. Monitoring distributed systems
Multiple processes, servers and queues
We want to monitor the time it takes for a
message to be processed
We need to monitor the message queues
@farmar8
9. Monitoring distributed systems
Turn the lights on…
Distributed system is built for failure
After you covered the basics get the other
aspects of monitoring covered
@farmar9
So let’s look at a simple scenario
We have a layered architecture with a tiered deployment and we have a high I/O operation like sending emails.
We want to move the email functionality to it’s own components (service) -> so we can send a message from our business logic code -> in an asynchronous fire and forget operation to improve performance.
->
Next, let’s do the same for our PDF convertor functionality.
->
Now we can go on and add an integration with our CRM, synchronising events from our business logic code and our CRM system using publish subscribe.
So from a layered architecture with a tiered deployment having a web tier and a database tier, we now have a much more distributed system, a couple of processes and servers talking to each other using queues and messaging.
We also introduced publish subscribe for integration with our CRM.
We can see here a microservices style deployment where by each service has it’s own database.
Regardless of the physical deployment, you want to monitor all the different component of the system and the flow between them.
So now we have distributed our system.
We added a couple of processes.
And we added a queuing mechanism.
There are certain elements of queues and messaging that should be monitored, these element are different from a traditional web based environment using http/rest messages.
We want to monitor the time it takes for a message to be processed end to end, we have a couple of metrics we can look at.