2. An average production system
• Is the web server up?
• Is the database up?
• Can the webserver
talk to the db?
3. What are you actually monitoring?
Are my servers
Is my application
Can users access
4. What are you actually monitoring?
Is my application generating exceptions?
How quickly is my system processing messages?
Can I handle month end batch jobs?
Is the server up?
Is there high CPU?
Do I have enough disk space?
Can users access the system?
Are we meeting our SLAs?
What is the impact of adding another customer?
So let’s look at a simple scenario
We have a layered architecture with a tiered deployment and we have a high I/O operation like sending emails.
We want to move the email functionality to it’s own components (service) -> so we can send a message from our business logic code -> in an asynchronous fire and forget operation to improve performance.
Next, let’s do the same for our PDF convertor functionality.
Now we can go on and add an integration with our CRM, synchronising events from our business logic code and our CRM system using publish subscribe.
So from a layered architecture with a tiered deployment having a web tier and a database tier, we now have a much more distributed system, a couple of processes and servers talking to each other using queues and messaging.
We also introduced publish subscribe for integration with our CRM.
We can see here a microservices style deployment where by each service has it’s own database.
Regardless of the physical deployment, you want to monitor all the different component of the system and the flow between them.
So now we have distributed our system.
We added a couple of processes.
And we added a queuing mechanism.
There are certain elements of queues and messaging that should be monitored, these element are different from a traditional web based environment using http/rest messages.
We want to monitor the time it takes for a message to be processed end to end, we have a couple of metrics we can look at.