© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building an
effective
observability
strategy
Ania Develter (she/her)
Sr. Specialist SA
AWS
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Observability maturity
model
Value of observability
Observing what matters
Collecting signals
Extracting data & demo
Designing alerts and
dashboards
Selecting the right tools
& demo
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS observability maturity model
Capability
Maturity
Stage 1: Foundational
monitoring
Stage 2: Intermediate
monitoring
Stage 3: Advanced
observability
Stage 4: Proactive
observability
C O L L E C T I N G T E L E M E T R Y
D A T A
T E L E M E T R Y A N A L Y S I S
A N D I N S I G H T S
C O R R E L A T I O N A N D
A N O M A L Y D E T E C T I O N
A U T O M A T I C A N D
P R O A C T I V E R O O T C A U S E
I D E N T I F I C A T I O N
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stage 1: Foundational
monitoring
C O L L E C T I N G T E L E M E T R Y D A T A
Current state:
• Disparate monitoring solutions
• Siloed tools
Plan of action:
• Establish an understanding of current state
• Set realistic goals for improvement
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stage 2: Intermediate
monitoring
T E L E M E T R Y A N A L Y S I S A N D I N S I G H T S
Current state:
• Well-defined process for collecting signals
• Spend a lot of time debugging issues
Plan of action:
• Deploy policies and practices
• Define actionable KPIs
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stage 3: Advanced
observability
C O R R E L A T I O N A N D A N O M A L Y D E T E C T I O N
Current state:
• Org-wide strategy
• End-to-end observability
Plan of action:
• Integrate with other critical systems
• Automate operations using AI/ML tools
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stage 4: Proactive
observability
A U T O M A T I C A N D P R O A C T I V E R O O T
C A U S E I D E N T I F I C A T I O N
Current state:
• Well-trained AI/ML models for
identification of root cause
• Proactive remediation
Plan of action:
• Continuous improvement
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reactive
monitoring
Blissful
ignorance
Confusion
False hope
Stress
Desperation
Enlightenment
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But it’s more than . . .
• Resolving incidents
It’s also:
• Understanding impact
• Data-driven decisions
• Happy customers!!!
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer requirements
Location Price Security Page speed Search
Choice
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. Observe what matters
Define what matters to your:
• Customers
• Business
• Internal stakeholders
• Project
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2. Measure your objectives
• Success metrics
(KPIs/SLAs/SLOs/other)
• Know what good looks like
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
3. Identify sources
• Is the data available?
• Extract data
• Plan ahead
Logs
Metrics
Traces
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
4. Alerting strategy
Define criteria (warning/critical)
Define actions
Avoid alert fatigue
Review after incidents
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
5. Dashboard strategy
Stakeholder dashboards
Cost, service audit, capacity planning
Low-level dashboards
Infrastructure, microservice, dependency
Backend
microservice
AWS Lambda
functions
Infrastructure
dashboard
Infrastructure
dashboard
Dependency
dashboard
Microservice
dashboard
High-level dashboards
Customer experience, system level,
service instance
Service
audit
dashboard
Customer
experience
dashboard
Clients
API
microservice
Amazon EC2
instances
System
dashboard
Additional requirements
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6. Tool selection
• Right tool for the job
• Pick features you need
• Consolidate & standardize
• Define exception process
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
7. Bring it all together
• Document
• Build into internal processes
• Operational readiness
• Implement but don’t “boil the
ocean”
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
8. Iterate
• Know your baselines
• Review routinely
• Review after incidents
Customer
experience
Collect
Improve Act
Business
stakeholders
Customer
needs
KPIs
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
Observe what
matters
 To your customers, business,
project/business unit
 Work with business
stakeholders
Measure your
objectives
 Know your success criteria
(business KPIs, SLAs, SLOs)
 Know what good looks like
Identify
sources
 Collect the right signals to
measure business objectives
Establish where
you are
 Review observability
maturity model
WRAPPING UP
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
W R A P P I N G U P
Define process  Document
 Don’t boil the ocean
Alerting
strategy
 Alert (only) when business
outcomes are at risk
 Avoid alert fatigue
Dashboarding
strategy
 Build dashboards for
operational visibility
Tool selection  Right tool for the job
 Features you need
 Consolidate & standardize
Iterate  Review after incidents
 Keep focus on customers
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ania Develter
linkedin.com/in/ania
-develter

Building an Effective Observability Strategy First Call Deck.pdf

  • 1.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building an effective observability strategy Ania Develter (she/her) Sr. Specialist SA AWS
  • 2.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Agenda Observability maturity model Value of observability Observing what matters Collecting signals Extracting data & demo Designing alerts and dashboards Selecting the right tools & demo
  • 3.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 4.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS observability maturity model Capability Maturity Stage 1: Foundational monitoring Stage 2: Intermediate monitoring Stage 3: Advanced observability Stage 4: Proactive observability C O L L E C T I N G T E L E M E T R Y D A T A T E L E M E T R Y A N A L Y S I S A N D I N S I G H T S C O R R E L A T I O N A N D A N O M A L Y D E T E C T I O N A U T O M A T I C A N D P R O A C T I V E R O O T C A U S E I D E N T I F I C A T I O N
  • 5.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Stage 1: Foundational monitoring C O L L E C T I N G T E L E M E T R Y D A T A Current state: • Disparate monitoring solutions • Siloed tools Plan of action: • Establish an understanding of current state • Set realistic goals for improvement
  • 6.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Stage 2: Intermediate monitoring T E L E M E T R Y A N A L Y S I S A N D I N S I G H T S Current state: • Well-defined process for collecting signals • Spend a lot of time debugging issues Plan of action: • Deploy policies and practices • Define actionable KPIs
  • 7.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Stage 3: Advanced observability C O R R E L A T I O N A N D A N O M A L Y D E T E C T I O N Current state: • Org-wide strategy • End-to-end observability Plan of action: • Integrate with other critical systems • Automate operations using AI/ML tools
  • 8.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Stage 4: Proactive observability A U T O M A T I C A N D P R O A C T I V E R O O T C A U S E I D E N T I F I C A T I O N Current state: • Well-trained AI/ML models for identification of root cause • Proactive remediation Plan of action: • Continuous improvement
  • 9.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 10.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Reactive monitoring Blissful ignorance Confusion False hope Stress Desperation Enlightenment
  • 11.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. But it’s more than . . . • Resolving incidents It’s also: • Understanding impact • Data-driven decisions • Happy customers!!!
  • 12.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 13.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Customer requirements Location Price Security Page speed Search Choice
  • 14.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 15.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 1. Observe what matters Define what matters to your: • Customers • Business • Internal stakeholders • Project
  • 16.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 17.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 2. Measure your objectives • Success metrics (KPIs/SLAs/SLOs/other) • Know what good looks like
  • 18.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 3. Identify sources • Is the data available? • Extract data • Plan ahead Logs Metrics Traces
  • 19.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 20.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 21.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 4. Alerting strategy Define criteria (warning/critical) Define actions Avoid alert fatigue Review after incidents
  • 22.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 5. Dashboard strategy Stakeholder dashboards Cost, service audit, capacity planning Low-level dashboards Infrastructure, microservice, dependency Backend microservice AWS Lambda functions Infrastructure dashboard Infrastructure dashboard Dependency dashboard Microservice dashboard High-level dashboards Customer experience, system level, service instance Service audit dashboard Customer experience dashboard Clients API microservice Amazon EC2 instances System dashboard Additional requirements
  • 23.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 6. Tool selection • Right tool for the job • Pick features you need • Consolidate & standardize • Define exception process
  • 24.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 25.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 7. Bring it all together • Document • Build into internal processes • Operational readiness • Implement but don’t “boil the ocean”
  • 26.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 8. Iterate • Know your baselines • Review routinely • Review after incidents Customer experience Collect Improve Act Business stakeholders Customer needs KPIs
  • 27.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Summary Observe what matters  To your customers, business, project/business unit  Work with business stakeholders Measure your objectives  Know your success criteria (business KPIs, SLAs, SLOs)  Know what good looks like Identify sources  Collect the right signals to measure business objectives Establish where you are  Review observability maturity model WRAPPING UP
  • 28.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Summary W R A P P I N G U P Define process  Document  Don’t boil the ocean Alerting strategy  Alert (only) when business outcomes are at risk  Avoid alert fatigue Dashboarding strategy  Build dashboards for operational visibility Tool selection  Right tool for the job  Features you need  Consolidate & standardize Iterate  Review after incidents  Keep focus on customers
  • 29.
    © 2025, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Thank you! © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ania Develter linkedin.com/in/ania -develter