6.53
PLATFORM
OBSERVABILITY
2
SCCM
DCM
Symantec EP
Security
DCM
Group Policy
Tanium
Tenable
Adptiva
Active Directory
Aternity Cloud
Exploit / Credential
Guard
Elastic
WinLogBeats
Cisco
AnyConnect
Complex
Tech
Stack
• Challenge #1: Numerous requests (from different
channels) and interruptions
• Challenge #2: No easier way to determine which
tech stack is causing the issue
• Challenge #3: Recurring issues
• Challenge #4: Long resolution times and lost
issues
• Challenge #5: Repetitive, time-consuming tasks
OPS Challenges
Platform Observability
“is when you infer the
internal state of a system
only by observing the data it
generates, such as logs,
metrics, and traces”
3
When observability is implemented well, a system will
not require operations teams to spend much effort on
understanding its internal state.
Monitoring - something you do to
determine the state of an application, a
system, a service…
Observability - is based on the
acquisition of data that allows you to ask
questions you didn’t know you have and
solve problems that you never thought of
Observability is all about data: At scale, at
speed and analytics-driven
Monitoring tells you when something is wrong.
Observability lets you ask why
Monitoring
Observability
Things we
are aware of
but DON’T
understand
Things we
are NOT
aware of
and DON’T
understand
Observability Impacts & Importance on Effective
SRE/DevOps
5
Cloud Migration
• Hybrid cloud Monitoring
• Cloud Cost Management
• Cloud capacity Planning
Cloud Monitoring
• Cloud services monitoring
• Kubernetes & container
monitoring
• Serverless monitoring
• KPI monitoring with
custom metrics
• Observability-as-a-
service
App Optimization
• Application
Modernization
• Microservices monitoring
& troubleshooting
• Business SLA monitoring
• DevOps application
Lifecyle monitoring
Higher visibility
Better workflow
Faster alerts
Finding out
unknown issues
Reduced
operational cost
Increased developer
velocity
Three Pilers of Observability
Metrics Traces Events / Logs
What is happening?
Where is it happening?
Why is it happening?
Detect Troubleshoot Pinpoint
• quantify performance
• produce alerts such as when a system
is down, or load balancers reach
capacity
• monitor events for anomalous activities
• API Queries
• Server-to-Server Workload
• Internal API Calls
• Frontend API Traffic
• answer the “who, what, where, when,
and how”
Centralized Observability – Key benefits
Current Benefits
• Faster issue detection and resolutions for
improved end user satisfaction
• Improved performance and platform reliability
• Lower infrastructure costs by right sizing and
identifying bottlenecks
• Improved time to market by enabling agile
frequent product delivery through automated
CICD pipelines without impacting quality
Outlook (Gartner)
• The observability landscape is in its early
stages. Enterprise are becoming frustrated
with limitation in monitoring tools and despite
decades of investment in monitoring tools
continue to rely on customers to notice
outages
• By 2024, over 30% of enterprises
implementing distributed systems will have
adopted observability techniques to improve
service performance, up from less than 10%
in 2020.
7
Observability Stack
8
• Log Analysis
• Real User Monitoring
• Infrastructure Monitoring
• Search
• Dashboards
• Machine Learning
• Alerting
• Logs
• Metrics
• Traces
Applications
Infrastructure
Cloud
Middleware
Network
Social
Business
Etc.,..
• Proactively detect issues
• Reduce MTTD and MTTR
• Continually optimize
• Deliver on SLAs
Any Data / Any Source
Open Access
Data
Common data
model
UI and Use
Cases
Continuous Feedback Loop
9
DEV Ops
Monitoring
Service Desk
Issue
Triage
Postmortem
Incident Review
Error Budget decisions
Monitor data using dashboards
Proactive reaction to automated alarms
Application Metrics and preventive
prediction of application load
Postmortem Analysis
Avoidance of future errors
Simplification of development process
Acceleration of Provisioning
CI/CD pipelines
Monitor data using dashboards
Proactive reaction to automated alarms
Application Metrics and preventive
prediction of application load
Releasing new features
Expected system changes
Planned downtime
Service Level Objective and
Indicators (SLO and SLI)
• Define service quality in advance
• Determine metrics and indicators
Risk Acceptance and Mitigation
Plan
• Assess risk viability and reliability
• Evaluate status with the pre
standard set
Automation
• Automate repetitive tasks
• Ensure faster delivery and
execution of tasks
Proactive Monitoring
• Prepare with constant
improvement plan to minimize
incidents
• Establish well monitored
environments
Observability is a Framework
• Observability is not a tool, but rather a framework, to enable quick interrogation of services to
identify underlying cause of issues even when the issues have never occurred before. Has
great potential to improve operational efficiency and ultimately overall stability of our
platforms.
• The more data points ingested into observability the more insights and value we can gain from
the platform.
• Requires willingness to adopt new ways of working shifting away from existing siloed technical
domains that lack full transparency to full visibility into the inner workings of platforms
Workplace and Collaboration teams depends on
10
Quotes on Monitoring vs Observability
11
“Monitoring is a single plane for the most part. You set up rules,
aggregations, and alerts on when a known scenario plays out
(e.g., a trajectory towards 100% disk usage is an indication of an
issue in the imminent future). Observability, on the other hand, is
the means to map an environment or context and the ability to
fluidly traverse that map, thus reaching a greater awareness of
‘what is.’” - Ryan Sheldrake, field CTO, Lacework
“Observability is about putting mechanisms in place
that allow teams to actively debug their system. It is
based on exploring properties and patterns not
defined in advance. The main purpose of observability
is to use the system’s outputs to gather insights and
act on them.” -Parveen Arora, co-founder and
director, VVnT SeQuor
“The key difference between observability and
monitoring is that compared to monitoring, site
observability gives a more complete assessment of
the overall environment in which the application
resides and hence is more effective in fulfilling the
key success factor for an application – that of site
reliability.” - Sushant Mehta, senior manager application
development, Diyar United Company
"Monitoring is the process of using observability. When monitoring
occurs, one has already decided which events and applications
will be tracked. Observability creates the potential to monitor
different events along the pipeline and the overall software
development lifecycle. As processes get built, the potential for
observability should be included across a broad spectrum.
Monitoring finds specific events across the system and creates
artifacts and reports that can be integrated into overall metrics." -
Mark Peters, technical lead, Novetta
Ref: https://enterprisersproject.com/article/2021/9/devops-monitoring-vs-observability

What is Platform Observability? An Overview

  • 1.
  • 2.
    2 SCCM DCM Symantec EP Security DCM Group Policy Tanium Tenable Adptiva ActiveDirectory Aternity Cloud Exploit / Credential Guard Elastic WinLogBeats Cisco AnyConnect Complex Tech Stack • Challenge #1: Numerous requests (from different channels) and interruptions • Challenge #2: No easier way to determine which tech stack is causing the issue • Challenge #3: Recurring issues • Challenge #4: Long resolution times and lost issues • Challenge #5: Repetitive, time-consuming tasks OPS Challenges
  • 3.
    Platform Observability “is whenyou infer the internal state of a system only by observing the data it generates, such as logs, metrics, and traces” 3 When observability is implemented well, a system will not require operations teams to spend much effort on understanding its internal state.
  • 4.
    Monitoring - somethingyou do to determine the state of an application, a system, a service… Observability - is based on the acquisition of data that allows you to ask questions you didn’t know you have and solve problems that you never thought of Observability is all about data: At scale, at speed and analytics-driven Monitoring tells you when something is wrong. Observability lets you ask why Monitoring Observability Things we are aware of but DON’T understand Things we are NOT aware of and DON’T understand
  • 5.
    Observability Impacts &Importance on Effective SRE/DevOps 5 Cloud Migration • Hybrid cloud Monitoring • Cloud Cost Management • Cloud capacity Planning Cloud Monitoring • Cloud services monitoring • Kubernetes & container monitoring • Serverless monitoring • KPI monitoring with custom metrics • Observability-as-a- service App Optimization • Application Modernization • Microservices monitoring & troubleshooting • Business SLA monitoring • DevOps application Lifecyle monitoring Higher visibility Better workflow Faster alerts Finding out unknown issues Reduced operational cost Increased developer velocity
  • 6.
    Three Pilers ofObservability Metrics Traces Events / Logs What is happening? Where is it happening? Why is it happening? Detect Troubleshoot Pinpoint • quantify performance • produce alerts such as when a system is down, or load balancers reach capacity • monitor events for anomalous activities • API Queries • Server-to-Server Workload • Internal API Calls • Frontend API Traffic • answer the “who, what, where, when, and how”
  • 7.
    Centralized Observability –Key benefits Current Benefits • Faster issue detection and resolutions for improved end user satisfaction • Improved performance and platform reliability • Lower infrastructure costs by right sizing and identifying bottlenecks • Improved time to market by enabling agile frequent product delivery through automated CICD pipelines without impacting quality Outlook (Gartner) • The observability landscape is in its early stages. Enterprise are becoming frustrated with limitation in monitoring tools and despite decades of investment in monitoring tools continue to rely on customers to notice outages • By 2024, over 30% of enterprises implementing distributed systems will have adopted observability techniques to improve service performance, up from less than 10% in 2020. 7
  • 8.
    Observability Stack 8 • LogAnalysis • Real User Monitoring • Infrastructure Monitoring • Search • Dashboards • Machine Learning • Alerting • Logs • Metrics • Traces Applications Infrastructure Cloud Middleware Network Social Business Etc.,.. • Proactively detect issues • Reduce MTTD and MTTR • Continually optimize • Deliver on SLAs Any Data / Any Source Open Access Data Common data model UI and Use Cases
  • 9.
    Continuous Feedback Loop 9 DEVOps Monitoring Service Desk Issue Triage Postmortem Incident Review Error Budget decisions Monitor data using dashboards Proactive reaction to automated alarms Application Metrics and preventive prediction of application load Postmortem Analysis Avoidance of future errors Simplification of development process Acceleration of Provisioning CI/CD pipelines Monitor data using dashboards Proactive reaction to automated alarms Application Metrics and preventive prediction of application load Releasing new features Expected system changes Planned downtime Service Level Objective and Indicators (SLO and SLI) • Define service quality in advance • Determine metrics and indicators Risk Acceptance and Mitigation Plan • Assess risk viability and reliability • Evaluate status with the pre standard set Automation • Automate repetitive tasks • Ensure faster delivery and execution of tasks Proactive Monitoring • Prepare with constant improvement plan to minimize incidents • Establish well monitored environments
  • 10.
    Observability is aFramework • Observability is not a tool, but rather a framework, to enable quick interrogation of services to identify underlying cause of issues even when the issues have never occurred before. Has great potential to improve operational efficiency and ultimately overall stability of our platforms. • The more data points ingested into observability the more insights and value we can gain from the platform. • Requires willingness to adopt new ways of working shifting away from existing siloed technical domains that lack full transparency to full visibility into the inner workings of platforms Workplace and Collaboration teams depends on 10
  • 11.
    Quotes on Monitoringvs Observability 11 “Monitoring is a single plane for the most part. You set up rules, aggregations, and alerts on when a known scenario plays out (e.g., a trajectory towards 100% disk usage is an indication of an issue in the imminent future). Observability, on the other hand, is the means to map an environment or context and the ability to fluidly traverse that map, thus reaching a greater awareness of ‘what is.’” - Ryan Sheldrake, field CTO, Lacework “Observability is about putting mechanisms in place that allow teams to actively debug their system. It is based on exploring properties and patterns not defined in advance. The main purpose of observability is to use the system’s outputs to gather insights and act on them.” -Parveen Arora, co-founder and director, VVnT SeQuor “The key difference between observability and monitoring is that compared to monitoring, site observability gives a more complete assessment of the overall environment in which the application resides and hence is more effective in fulfilling the key success factor for an application – that of site reliability.” - Sushant Mehta, senior manager application development, Diyar United Company "Monitoring is the process of using observability. When monitoring occurs, one has already decided which events and applications will be tracked. Observability creates the potential to monitor different events along the pipeline and the overall software development lifecycle. As processes get built, the potential for observability should be included across a broad spectrum. Monitoring finds specific events across the system and creates artifacts and reports that can be integrated into overall metrics." - Mark Peters, technical lead, Novetta Ref: https://enterprisersproject.com/article/2021/9/devops-monitoring-vs-observability

Editor's Notes

  • #4 Observability is the ability to understand a system's internal state by analyzing the data it generates, such as logs, metrics, and traces