SlideShare a Scribd company logo
Brought to you by
From SLOs to GOTY
How observability and service-level objectives
can help get you to “Game of the Year”
Charity Majors
CTO of Honeycomb
@mipsytipsy
How observability and service-level objectives can help get you to
“Game Of The Year”
From SLOs to GOTY
@mipsytipsy
“Your System Is Broken”
@mipsytipsy
engineer/cofounder/CTO
https://charity.wtf
“Second Life Is Not A Game” — Linden Lab,
incessantly
Your imagination
Your tools & telemetry
Compelling game design / play
Fast, glitch-free experience
Monitoring
Observability
+ SLOs
Observability
Averages are bullshit.
99th% is bullshit.
99.999% is bullshit.
Every individual player experience counts.
Observability
Any ONE player who can’t log in can start
a shitstorm on the forums.
“My system has four nines”— this statement can be true, and yet:
• Everybody who logged into the game today with state saved on an unresponsive
shard can think you are 100% down
• Latency on the /login endpoint could be timing out for everybody in a
particular region
• Upserts to /payment could be failing upon registration
• Certain Android device types could be silently dropping push notifications
• Game state saves could be pointed at a read replica instead of a write replica
Observability
Without observability, your team will resort to
guessing and iterating blindly, without evidence,
and you will struggle to connect feedback loops
that result in a fast, glitch-free experience.
Observability is the hidden link between
engineering experience and player
experience.
Observability lets you inspect cause and
effect at a granular level. Observability
enables engineers to have ownership
over the lifecycle of their software.
Can you understand what’s happening inside your games, just by asking
questions from the outside? Can you debug your code and reconstruct
any user’s experience using the output of your tools?
Can you understand new scenarios without shipping new code?
o11y for game developers:
If you can’t see it, you can’t improve it
Game devs were some of the first to run up
against the limits of low-cardinality tools
https://www.eveonline.com/news/view/introducing-
quasar
• Complex, highly distributed architectures
• Designed and developed by a multitude of teams
• Played across thousands of device types
• Enormous concurrency issues and thundering
herds
• High cardinality
• High dimensionality
• Based on arbitrarily-wide structured events
• …with span ids, to support tracing
• Exploratory, open-ended investigation of raw events
• No indexes, schemas, or pre-aggregation
• Bundles the full context of the request across network hops
First Principles of Observability
https://www.honeycomb.io/blog/so-you-want-to-build-an-observability-tool/
• Achievable with metrics-based tools (Prometheus, DataDog, etc)
• Compatible with write-time aggregation
• The same thing as monitoring
• Anything whatsoever to do with “pillars”
Observability IS NOT
The fundamental building block of
observability tools is the arbitrarily-wide
structured data blob, or ‘canonical log line’,
which can have hundreds of k/v pairs.
The fundamental building block of monitoring tools is the metric.
If we rely on metrics, logs, and post-hoc
monitoring, we will find most of our problems
via customer reporting. This sucks.
Complexity is exploding everywhere,
but our tools were designed
for predictable worlds
Most bugs will never turn up in any staging
environment or be caught by our test
suites. Many bugs aren’t even bugs! —
they’re simply user interactions!
We need to shift our focus away from writing
infinity tests and monitoring checks and trying
to predict what will break, or checking for the
same error states again and again.
We need to embrace the fact that we all test in production,
and give ourselves the tools to do this well.
Instrument code for observability, shrink the deploy cycle to
a few minutes, ship one mergeset by one developer at a
time, and look at our code in production.
Write code with instrumentation
Run your systems with
SLOs
Tighten up your feedback loops with deploys
The solution:
O.D.D.
Observability-Driven Development
Write code with instrumentation
• Kickstart the virtuous cycle of “you build it, you own it”
• By instrumenting your code as you write it
• Never accept a PR unless you can explain how to tell if it breaks
• Watch your code go out as it deploys
• Observe your code through the lens of your instrumentation, asking:
• “Is it working as intended? Does anything else look weird?”
Tight feedback loops with fast deploys
Fifteen minutes or bust.
One mergeset by one dev at a time
Many times a
day
Run your systems with SLOs.
Management:
Engineering:
Operations:
Service-Level Objectives
In search of a common
language:
How broken is too broken?
What does good enough mean?
Combatting alert fatigue
Eligible: “Had an http status code”
Good: “… that was a 200, and was served in
under 500 ms”
Good events
———————
Eligible events
Service Level Indicator =
Service-Level Objectives
Service-Level Objectives
We ALWAYS store incoming user data
Default dashboards USUALLY load in <1
sec
Queries OFTEN return in <10 sec
99.99%
99.9%
99%
~4.3 minutes
~45 minutes
7.3 hours
(Honeycomb
SLOs)
Monitoring Checks, Symptom-based alerts
• A disk is 89% full on one of your database primaries
• CPU saturation on your export cluster is at 90% average
• 5% of requests to a partner API are returning HTTP 500
• The queue of iOS push notifications has been filling up and not
draining for the past 15 minutes
Instead of alerting on hundreds or thousands of symptom-
based monitoring checks, alert only on a few precious SLOs
that directly reflect user pain. Less noise, better service.
Status of an SLO
How have we done?
SLO examples
…
Service-Level Objectives +
Observability…for easy debugging
You have an observable system
when your team can quickly and reliably diagnose
any new behavior with no prior knowledge.
Observability begins with
rich instrumentation, putting you in
constant conversation with your code
It brings everyone up to the level
of the best debugger in each area
With SLOs, it is how you get to a
fast, glitch-free experience
This is how you get a jump on
glitches.
Instrument your code for observability.
Tighten up your feedback loops by deploying a single merges by a single
engineer at a time, very fast, many times a day (fifteen minutes or less!)
Replace your floods of symptom alerts with a few well-chosen
SLOs
Watch your code as it goes out.
Find the errors before your users do.
… and that’s
okay.
Charity Majors
@mipsytipsy
Brought to you by
Charity Majors
CTO of Honeycomb

More Related Content

Similar to From SLO to GOTY

6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
Dynatrace
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface DevicePositive Hack Days
 
Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021
Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021
Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021
Teemu Tiainen
 
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Phil Agcaoili
 
How HipChat Ships and Recovers Fast with DevOps Practices
How HipChat Ships and Recovers Fast with DevOps PracticesHow HipChat Ships and Recovers Fast with DevOps Practices
How HipChat Ships and Recovers Fast with DevOps Practices
Atlassian
 
Continuous delivery @wcap 5-09-2013
Continuous delivery   @wcap 5-09-2013Continuous delivery   @wcap 5-09-2013
Continuous delivery @wcap 5-09-2013
David Funaro
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
Charity Majors
 
Putting the pro in programmer
Putting the pro in programmerPutting the pro in programmer
Putting the pro in programmer
Switch Systems Ltd
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
Paul Boos
 
How to Embed Codeless Test Automation Into DevOps
How to Embed Codeless Test Automation Into DevOpsHow to Embed Codeless Test Automation Into DevOps
How to Embed Codeless Test Automation Into DevOps
Perfecto by Perforce
 
SplunkLive! London 2016 Splunk for Devops
SplunkLive! London 2016 Splunk for DevopsSplunkLive! London 2016 Splunk for Devops
SplunkLive! London 2016 Splunk for Devops
Splunk
 
pentest
pentestpentest
pentest
mevom8177
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven developmentEinar Ingebrigtsen
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Steve Mercier
 
Enhancing Developer Productivity with Code Forensics
Enhancing Developer Productivity with Code ForensicsEnhancing Developer Productivity with Code Forensics
Enhancing Developer Productivity with Code Forensics
TechWell
 
Solving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous TestingSolving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous Testing
Perfecto by Perforce
 
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston
 
5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation
Sauce Labs
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
Punit Goswami
 
Owasp tds
Owasp tdsOwasp tds
Owasp tds
snyff
 

Similar to From SLO to GOTY (20)

6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface Device
 
Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021
Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021
Zero Trust And Best Practices for Securing Endpoint Apps on May 24th 2021
 
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...
 
How HipChat Ships and Recovers Fast with DevOps Practices
How HipChat Ships and Recovers Fast with DevOps PracticesHow HipChat Ships and Recovers Fast with DevOps Practices
How HipChat Ships and Recovers Fast with DevOps Practices
 
Continuous delivery @wcap 5-09-2013
Continuous delivery   @wcap 5-09-2013Continuous delivery   @wcap 5-09-2013
Continuous delivery @wcap 5-09-2013
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
Putting the pro in programmer
Putting the pro in programmerPutting the pro in programmer
Putting the pro in programmer
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
 
How to Embed Codeless Test Automation Into DevOps
How to Embed Codeless Test Automation Into DevOpsHow to Embed Codeless Test Automation Into DevOps
How to Embed Codeless Test Automation Into DevOps
 
SplunkLive! London 2016 Splunk for Devops
SplunkLive! London 2016 Splunk for DevopsSplunkLive! London 2016 Splunk for Devops
SplunkLive! London 2016 Splunk for Devops
 
pentest
pentestpentest
pentest
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven development
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
 
Enhancing Developer Productivity with Code Forensics
Enhancing Developer Productivity with Code ForensicsEnhancing Developer Productivity with Code Forensics
Enhancing Developer Productivity with Code Forensics
 
Solving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous TestingSolving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous Testing
 
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
 
5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
 
Owasp tds
Owasp tdsOwasp tds
Owasp tds
 

More from ScyllaDB

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
ScyllaDB
 

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

From SLO to GOTY

  • 1. Brought to you by From SLOs to GOTY How observability and service-level objectives can help get you to “Game of the Year” Charity Majors CTO of Honeycomb
  • 2. @mipsytipsy How observability and service-level objectives can help get you to “Game Of The Year” From SLOs to GOTY
  • 5. Your imagination Your tools & telemetry Compelling game design / play Fast, glitch-free experience
  • 8. Averages are bullshit. 99th% is bullshit. 99.999% is bullshit. Every individual player experience counts. Observability Any ONE player who can’t log in can start a shitstorm on the forums.
  • 9. “My system has four nines”— this statement can be true, and yet: • Everybody who logged into the game today with state saved on an unresponsive shard can think you are 100% down • Latency on the /login endpoint could be timing out for everybody in a particular region • Upserts to /payment could be failing upon registration • Certain Android device types could be silently dropping push notifications • Game state saves could be pointed at a read replica instead of a write replica Observability
  • 10. Without observability, your team will resort to guessing and iterating blindly, without evidence, and you will struggle to connect feedback loops that result in a fast, glitch-free experience. Observability is the hidden link between engineering experience and player experience. Observability lets you inspect cause and effect at a granular level. Observability enables engineers to have ownership over the lifecycle of their software.
  • 11. Can you understand what’s happening inside your games, just by asking questions from the outside? Can you debug your code and reconstruct any user’s experience using the output of your tools? Can you understand new scenarios without shipping new code? o11y for game developers: If you can’t see it, you can’t improve it
  • 12. Game devs were some of the first to run up against the limits of low-cardinality tools https://www.eveonline.com/news/view/introducing- quasar • Complex, highly distributed architectures • Designed and developed by a multitude of teams • Played across thousands of device types • Enormous concurrency issues and thundering herds
  • 13.
  • 14. • High cardinality • High dimensionality • Based on arbitrarily-wide structured events • …with span ids, to support tracing • Exploratory, open-ended investigation of raw events • No indexes, schemas, or pre-aggregation • Bundles the full context of the request across network hops First Principles of Observability
  • 15. https://www.honeycomb.io/blog/so-you-want-to-build-an-observability-tool/ • Achievable with metrics-based tools (Prometheus, DataDog, etc) • Compatible with write-time aggregation • The same thing as monitoring • Anything whatsoever to do with “pillars” Observability IS NOT
  • 16. The fundamental building block of observability tools is the arbitrarily-wide structured data blob, or ‘canonical log line’, which can have hundreds of k/v pairs. The fundamental building block of monitoring tools is the metric.
  • 17. If we rely on metrics, logs, and post-hoc monitoring, we will find most of our problems via customer reporting. This sucks. Complexity is exploding everywhere, but our tools were designed for predictable worlds Most bugs will never turn up in any staging environment or be caught by our test suites. Many bugs aren’t even bugs! — they’re simply user interactions!
  • 18. We need to shift our focus away from writing infinity tests and monitoring checks and trying to predict what will break, or checking for the same error states again and again. We need to embrace the fact that we all test in production, and give ourselves the tools to do this well. Instrument code for observability, shrink the deploy cycle to a few minutes, ship one mergeset by one developer at a time, and look at our code in production.
  • 19. Write code with instrumentation Run your systems with SLOs Tighten up your feedback loops with deploys The solution:
  • 20. O.D.D. Observability-Driven Development Write code with instrumentation • Kickstart the virtuous cycle of “you build it, you own it” • By instrumenting your code as you write it • Never accept a PR unless you can explain how to tell if it breaks • Watch your code go out as it deploys • Observe your code through the lens of your instrumentation, asking: • “Is it working as intended? Does anything else look weird?”
  • 21. Tight feedback loops with fast deploys Fifteen minutes or bust. One mergeset by one dev at a time Many times a day
  • 22. Run your systems with SLOs.
  • 23. Management: Engineering: Operations: Service-Level Objectives In search of a common language: How broken is too broken? What does good enough mean? Combatting alert fatigue
  • 24. Eligible: “Had an http status code” Good: “… that was a 200, and was served in under 500 ms” Good events ——————— Eligible events Service Level Indicator = Service-Level Objectives
  • 25. Service-Level Objectives We ALWAYS store incoming user data Default dashboards USUALLY load in <1 sec Queries OFTEN return in <10 sec 99.99% 99.9% 99% ~4.3 minutes ~45 minutes 7.3 hours (Honeycomb SLOs)
  • 26. Monitoring Checks, Symptom-based alerts • A disk is 89% full on one of your database primaries • CPU saturation on your export cluster is at 90% average • 5% of requests to a partner API are returning HTTP 500 • The queue of iOS push notifications has been filling up and not draining for the past 15 minutes Instead of alerting on hundreds or thousands of symptom- based monitoring checks, alert only on a few precious SLOs that directly reflect user pain. Less noise, better service.
  • 27. Status of an SLO How have we done? SLO examples …
  • 29.
  • 30. You have an observable system when your team can quickly and reliably diagnose any new behavior with no prior knowledge. Observability begins with rich instrumentation, putting you in constant conversation with your code It brings everyone up to the level of the best debugger in each area With SLOs, it is how you get to a fast, glitch-free experience
  • 31. This is how you get a jump on glitches. Instrument your code for observability. Tighten up your feedback loops by deploying a single merges by a single engineer at a time, very fast, many times a day (fifteen minutes or less!) Replace your floods of symptom alerts with a few well-chosen SLOs Watch your code as it goes out. Find the errors before your users do.
  • 34. Brought to you by Charity Majors CTO of Honeycomb