What is Platform Observability? An Overview

2
SCCM
DCM
Symantec EP
Security
DCM
Group Policy
Tanium
Tenable
Adptiva
Active Directory
Aternity Cloud
Exploit / Credential
Guard
Elastic
WinLogBeats
Cisco
AnyConnect
Complex
Tech
Stack
• Challenge #1: Numerous requests (from different
channels) and interruptions
• Challenge #2: No easier way to determine which
tech stack is causing the issue
• Challenge #3: Recurring issues
• Challenge #4: Long resolution times and lost
issues
• Challenge #5: Repetitive, time-consuming tasks
OPS Challenges

Platform Observability
“is when you infer the
internal state of a system
only by observing the data it
generates, such as logs,
metrics, and traces”
3
When observability is implemented well, a system will
not require operations teams to spend much effort on
understanding its internal state.

Monitoring - something you do to
determine the state of an application, a
system, a service…
Observability - is based on the
acquisition of data that allows you to ask
questions you didn’t know you have and
solve problems that you never thought of
Observability is all about data: At scale, at
speed and analytics-driven
Monitoring tells you when something is wrong.
Observability lets you ask why
Monitoring
Observability
Things we
are aware of
but DON’T
understand
Things we
are NOT
aware of
and DON’T
understand

Observability Impacts & Importance on Effective
SRE/DevOps
5
Cloud Migration
• Hybrid cloud Monitoring
• Cloud Cost Management
• Cloud capacity Planning
Cloud Monitoring
• Cloud services monitoring
• Kubernetes & container
monitoring
• Serverless monitoring
• KPI monitoring with
custom metrics
• Observability-as-a-
service
App Optimization
• Application
Modernization
• Microservices monitoring
& troubleshooting
• Business SLA monitoring
• DevOps application
Lifecyle monitoring
Higher visibility
Better workflow
Faster alerts
Finding out
unknown issues
Reduced
operational cost
Increased developer
velocity

Three Pilers of Observability
Metrics Traces Events / Logs
What is happening?
Where is it happening?
Why is it happening?
Detect Troubleshoot Pinpoint
• quantify performance
• produce alerts such as when a system
is down, or load balancers reach
capacity
• monitor events for anomalous activities
• API Queries
• Server-to-Server Workload
• Internal API Calls
• Frontend API Traffic
• answer the “who, what, where, when,
and how”

Centralized Observability – Key benefits
Current Benefits
• Faster issue detection and resolutions for
improved end user satisfaction
• Improved performance and platform reliability
• Lower infrastructure costs by right sizing and
identifying bottlenecks
• Improved time to market by enabling agile
frequent product delivery through automated
CICD pipelines without impacting quality
Outlook (Gartner)
• The observability landscape is in its early
stages. Enterprise are becoming frustrated
with limitation in monitoring tools and despite
decades of investment in monitoring tools
continue to rely on customers to notice
outages
• By 2024, over 30% of enterprises
implementing distributed systems will have
adopted observability techniques to improve
service performance, up from less than 10%
in 2020.
7

Observability Stack
8
• Log Analysis
• Real User Monitoring
• Infrastructure Monitoring
• Search
• Dashboards
• Machine Learning
• Alerting
• Logs
• Metrics
• Traces
Applications
Infrastructure
Cloud
Middleware
Network
Social
Business
Etc.,..
• Proactively detect issues
• Reduce MTTD and MTTR
• Continually optimize
• Deliver on SLAs
Any Data / Any Source
Open Access
Data
Common data
model
UI and Use
Cases

Continuous Feedback Loop
9
DEV Ops
Monitoring
Service Desk
Issue
Triage
Postmortem
Incident Review
Error Budget decisions
Monitor data using dashboards
Proactive reaction to automated alarms
Application Metrics and preventive
prediction of application load
Postmortem Analysis
Avoidance of future errors
Simplification of development process
Acceleration of Provisioning
CI/CD pipelines
Monitor data using dashboards
Proactive reaction to automated alarms
Application Metrics and preventive
prediction of application load
Releasing new features
Expected system changes
Planned downtime
Service Level Objective and
Indicators (SLO and SLI)
• Define service quality in advance
• Determine metrics and indicators
Risk Acceptance and Mitigation
Plan
• Assess risk viability and reliability
• Evaluate status with the pre
standard set
Automation
• Automate repetitive tasks
• Ensure faster delivery and
execution of tasks
Proactive Monitoring
• Prepare with constant
improvement plan to minimize
incidents
• Establish well monitored
environments

Observability is a Framework
• Observability is not a tool, but rather a framework, to enable quick interrogation of services to
identify underlying cause of issues even when the issues have never occurred before. Has
great potential to improve operational efficiency and ultimately overall stability of our
platforms.
• The more data points ingested into observability the more insights and value we can gain from
the platform.
• Requires willingness to adopt new ways of working shifting away from existing siloed technical
domains that lack full transparency to full visibility into the inner workings of platforms
Workplace and Collaboration teams depends on
10

Quotes on Monitoring vs Observability
11
“Monitoring is a single plane for the most part. You set up rules,
aggregations, and alerts on when a known scenario plays out
(e.g., a trajectory towards 100% disk usage is an indication of an
issue in the imminent future). Observability, on the other hand, is
the means to map an environment or context and the ability to
fluidly traverse that map, thus reaching a greater awareness of
‘what is.’” - Ryan Sheldrake, field CTO, Lacework
“Observability is about putting mechanisms in place
that allow teams to actively debug their system. It is
based on exploring properties and patterns not
defined in advance. The main purpose of observability
is to use the system’s outputs to gather insights and
act on them.” -Parveen Arora, co-founder and
director, VVnT SeQuor
“The key difference between observability and
monitoring is that compared to monitoring, site
observability gives a more complete assessment of
the overall environment in which the application
resides and hence is more effective in fulfilling the
key success factor for an application – that of site
reliability.” - Sushant Mehta, senior manager application
development, Diyar United Company
"Monitoring is the process of using observability. When monitoring
occurs, one has already decided which events and applications
will be tracked. Observability creates the potential to monitor
different events along the pipeline and the overall software
development lifecycle. As processes get built, the potential for
observability should be included across a broad spectrum.
Monitoring finds specific events across the system and creates
artifacts and reports that can be integrated into overall metrics." -
Mark Peters, technical lead, Novetta
Ref: https://enterprisersproject.com/article/2021/9/devops-monitoring-vs-observability

What is Platform Observability? An Overview

In this document