Observability at Scale

Observability
at Scale
Presented By: Rahul Miglani
VP Engineering - DevOps Practice Head
Knoldus Inc.

About Knoldus
Knoldus is a technology consulting firm with focus on modernizing the digital systems
at the pace your business demands.
DevOps
Functional. Reactive. Cloud Native

01 What Is Observability in DevOps?
02 Components of Observability
03 Benefits of Observability
04 Common Pitfalls in Observability
05 Observability at Scale and best practices
Our Agenda

What is Observability in DevOps
Observability is the foundation of reliability , When things
inevitably go wrong, observability enables engineers to quickly
diagnose and fix issues when they arise. The more complex a
system gets, and the higher user expectations are over reliability,
the more important it becomes to invest in advanced
observability methods to reason about what is going on.

Components of Observability
LEARN NW
LOGGING
METRICS
TRACING

Observability Pipeline
LEARN NW

c
Benefits of
Observability
● It helps the IT firm to have a complete understanding
of the internal workings of their system.
● Observability reduces the downtime spent in resolving
issues, as it tends to bring the possible causes of the
issue into focus.
● It gives the DevOps team the ability to identify the root
causes of issues.
● Observability makes debugging and troubleshooting
easier.
● Observability helps companies monitor the
performance of the application or system.
● It helps in speeding up the Mean Time to Detection
(MTTD) and the Mean Time To Resolution (MTTR) for
software infrastructure and services.
● Observability also enhances customer satisfaction if
staffers use data from logs and metrics to improve
services.

Pitfall 2: Working Without the Right Tools
Pitfall 3: Poor Alerting System
Pitfall 1: Uneven Distribution of Information
20XX
STRATEGY
Common Pitfalls in Observability

● Don’t try to monitor everything. Instead, gather only the necessary data.
● Focus more on monitoring essential things and fixing them if they fail.
● Avoid storing every log or data available. Rather, store those that give insights to critical events.
● Put up alerts on critical events.
● Create data graphs that are easily understandable by every team member, as this will improve
the usability of the information
MEASURE EVERYTHING
● Changes made to monitoring configuration.
● "Out of hours" alerts.
● Team alerting balance.
● False positives.
● False negatives.
● Alert creation.
● Alert acknowledgement.
● Alert silencing and silence duration.
● Unactionable alerts.
● Usability: alerts, runbooks, dashboards.
● MTTD, MTTR, impact.
Best Practices in Observability

Rahul Miglani
DevOps Practice Head
DevOps@Knoldus.com
Thank You!

Observability at Scale

More Related Content

What's hot

Similar to Observability at Scale

More from Knoldus Inc.

Recently uploaded

Observability at Scale