SlideShare a Scribd company logo
"What does it really mean for your system to be available, or how to define what to measure", Daniil Mazepin
Agenda
02
Who is Daniil? Why SLOs?
Terminology What to measure?
03
Who is Daniil?
Software Engineering Manager / Head of
Engineering with over 13 years of experience
at companies of varying sizes and stages of
maturity, ranging from small start-ups to
Facebook.
Experience spans multiple domains including
fintech, social media, e-commerce, and
gambling, utilising both top-down and
bottom-up approaches.
Terminology
Reliability
04
The system or service performs in the expected
way, when it’s required to do so.
Terminology
Service Level Indicator
(SLI)
05
A quantifiable measure
of service reliability.
Terminology
Service Level Objective
(SLO)
06
A reliability target
for an SLI.
Terminology
Service Level Agreement
(SLA)
07
A contract (usually legally binding) between
providers and customers of what happens if an SLO
is not met.
Terminology
Error Budget
08
An SLO implies an acceptable level of unreliability.
100 - SLO =
Error Budget for
the next X days
Terminology
SLO & Error Budget
Windows
09
Fixed Rolling
Calendars - per week, per month, per
quarter etc.
Works well for internal reporting
purposes.
Crucial for planning reliability work.
More closely aligned with the user
experience because users’ trust does not
magically recover on the first day of
each month.
Terminology
Burn Rates
10
The rate at which the allowed number of errors is
consumed.
Error Budget for
the next X days
Burn Rate < 1 Burn Rate > 1
Why SLOs?
But why would we invest in defining
and measuring SLOs?
11
To address the tension between the pace of
innovation and service reliability.
What to measure?
A service level indicator (SLI): A metric of a specific aspect of
your service.
Duration: The window where SLI is measured. This can be
calendar-based or a rolling window.
A target: The value (or range of values) that the SLI should meet
in the given duration in a healthy service.
Choose your SLO
12
What to measure?
The metric directly relates to user
happiness.
The metric deterioration correlates
with outages.
The metric provides a good signal-to-
noise ratio.
The metric scales monotonically and
linearly with customer happiness.
Characteristics of a good
metric
13
What to measure?
Request-driven services
14
Availability: The fraction of valid requests served successfully.
Latency: The fraction of valid requests served faster than a
threshold.
Quality (*): The fraction of valid requests served without of
degradation of service.
What to measure?
Availability?..
15
Uptime?..
Availability still answers whether
the system is up, but in more
precise way, then measuring the
time since the system was last
down.
Today services might be partially
down, which is the factor which
uptime doesn’t capture very well.
What to measure?
Data processing services
16
Coverage: The amount of data that has been processed,
expressed as a fraction. For example, 95%.
Correctness: The fraction of output data deemed to be correct.
For example, 99.99%.
Freshness: The freshness of the source data or aggregated
output data, expressed as a fraction.
Throughput: The fraction of time where data processing rate
was faster than a threshold.
What to measure?
How many 9s
do we need?
17
What to measure?
Why not 100%?
18
“100% is the wrong reliability target for basically
everything.”
Ben Treynor Sloss, founder of SRE at Google
What to measure?
Iterate!
19
“Picking the wrong number is better than picking no
number.”
from SRE.Google
What to measure?
Iterate! x2
20
Align dependencies.
Build complex SLOs where it makes sense.
Thank you!
21
Do you have any questions?

More Related Content

Similar to "What does it really mean for your system to be available, or how to define what to measure", Daniil Mazepin

Implementing It Service Excellence For Enhanced Customer Experience Complete ...
Implementing It Service Excellence For Enhanced Customer Experience Complete ...Implementing It Service Excellence For Enhanced Customer Experience Complete ...
Implementing It Service Excellence For Enhanced Customer Experience Complete ...
SlideTeam
 
Top 10 P2P Advanced Controls to improve your bottom line!
Top 10 P2P Advanced Controls to improve your bottom line!Top 10 P2P Advanced Controls to improve your bottom line!
Top 10 P2P Advanced Controls to improve your bottom line!
Oracle
 
Auditable Financial System for Government Contracting at Accenture Federal Se...
Auditable Financial System for Government Contracting at Accenture Federal Se...Auditable Financial System for Government Contracting at Accenture Federal Se...
Auditable Financial System for Government Contracting at Accenture Federal Se...
QueBIT Consulting
 
User Performance Analytics to improve Business Processes
User Performance Analytics to improve Business ProcessesUser Performance Analytics to improve Business Processes
User Performance Analytics to improve Business Processes
Thomas Jenewein
 
service metrics at ITSMFUSA 2008
service metrics at ITSMFUSA 2008service metrics at ITSMFUSA 2008
service metrics at ITSMFUSA 2008
guest904c03
 
Legal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive DataLegal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive Data
Bluelock
 
Legal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive DataLegal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive Data
Kayla Catron
 
Soa Meets Roi
Soa Meets RoiSoa Meets Roi
Soa Meets Roi
David Linthicum
 
Apq Qms Project Plan
Apq Qms Project PlanApq Qms Project Plan
Apq Qms Project Plan
Eng-Mohammad
 
Intelligently transform connected service from the phone to the field
Intelligently transform connected service from the phone to the field Intelligently transform connected service from the phone to the field
Intelligently transform connected service from the phone to the field
Salesforce - Sweden, Denmark, Norway
 
SplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
SplunkLive! Stockholm 2015 breakout - Splunk IT Service IntelligenceSplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
SplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
Splunk
 
Jason uyderv pmi 2 16 12
Jason uyderv pmi 2 16 12Jason uyderv pmi 2 16 12
Jason uyderv pmi 2 16 12
Jason Uyder
 
The Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan LockardThe Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan Lockard
agilemaine
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
Splunk
 
Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...
Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...
Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...
John Boruvka
 
Web Application Penetration Tests - Reporting
Web Application Penetration Tests - ReportingWeb Application Penetration Tests - Reporting
Web Application Penetration Tests - Reporting
Netsparker
 
MeasureWorks - Performance Labs - Why Observability Matters!
MeasureWorks - Performance Labs - Why Observability Matters!MeasureWorks - Performance Labs - Why Observability Matters!
MeasureWorks - Performance Labs - Why Observability Matters!
MeasureWorks
 
4 Tips for Better SLAs.
4 Tips for Better SLAs.4 Tips for Better SLAs.
4 Tips for Better SLAs.
Tanya Marshall
 
CIHS Top Tip - 4 Tips for better SLA's V2.0
CIHS Top Tip - 4 Tips for better SLA's V2.0CIHS Top Tip - 4 Tips for better SLA's V2.0
CIHS Top Tip - 4 Tips for better SLA's V2.0
Tanya Marshall
 
Service Cloud keynote
Service Cloud keynote Service Cloud keynote
Service Cloud keynote
Adama Sidibé
 

Similar to "What does it really mean for your system to be available, or how to define what to measure", Daniil Mazepin (20)

Implementing It Service Excellence For Enhanced Customer Experience Complete ...
Implementing It Service Excellence For Enhanced Customer Experience Complete ...Implementing It Service Excellence For Enhanced Customer Experience Complete ...
Implementing It Service Excellence For Enhanced Customer Experience Complete ...
 
Top 10 P2P Advanced Controls to improve your bottom line!
Top 10 P2P Advanced Controls to improve your bottom line!Top 10 P2P Advanced Controls to improve your bottom line!
Top 10 P2P Advanced Controls to improve your bottom line!
 
Auditable Financial System for Government Contracting at Accenture Federal Se...
Auditable Financial System for Government Contracting at Accenture Federal Se...Auditable Financial System for Government Contracting at Accenture Federal Se...
Auditable Financial System for Government Contracting at Accenture Federal Se...
 
User Performance Analytics to improve Business Processes
User Performance Analytics to improve Business ProcessesUser Performance Analytics to improve Business Processes
User Performance Analytics to improve Business Processes
 
service metrics at ITSMFUSA 2008
service metrics at ITSMFUSA 2008service metrics at ITSMFUSA 2008
service metrics at ITSMFUSA 2008
 
Legal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive DataLegal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive Data
 
Legal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive DataLegal Firms and the Struggle to Protect Sensitive Data
Legal Firms and the Struggle to Protect Sensitive Data
 
Soa Meets Roi
Soa Meets RoiSoa Meets Roi
Soa Meets Roi
 
Apq Qms Project Plan
Apq Qms Project PlanApq Qms Project Plan
Apq Qms Project Plan
 
Intelligently transform connected service from the phone to the field
Intelligently transform connected service from the phone to the field Intelligently transform connected service from the phone to the field
Intelligently transform connected service from the phone to the field
 
SplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
SplunkLive! Stockholm 2015 breakout - Splunk IT Service IntelligenceSplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
SplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
 
Jason uyderv pmi 2 16 12
Jason uyderv pmi 2 16 12Jason uyderv pmi 2 16 12
Jason uyderv pmi 2 16 12
 
The Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan LockardThe Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan Lockard
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
 
Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...
Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...
Contingency Planning and Risk Mitigation Strategies for Cloud-based Technolog...
 
Web Application Penetration Tests - Reporting
Web Application Penetration Tests - ReportingWeb Application Penetration Tests - Reporting
Web Application Penetration Tests - Reporting
 
MeasureWorks - Performance Labs - Why Observability Matters!
MeasureWorks - Performance Labs - Why Observability Matters!MeasureWorks - Performance Labs - Why Observability Matters!
MeasureWorks - Performance Labs - Why Observability Matters!
 
4 Tips for Better SLAs.
4 Tips for Better SLAs.4 Tips for Better SLAs.
4 Tips for Better SLAs.
 
CIHS Top Tip - 4 Tips for better SLA's V2.0
CIHS Top Tip - 4 Tips for better SLA's V2.0CIHS Top Tip - 4 Tips for better SLA's V2.0
CIHS Top Tip - 4 Tips for better SLA's V2.0
 
Service Cloud keynote
Service Cloud keynote Service Cloud keynote
Service Cloud keynote
 

More from Fwdays

"Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ..."Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ...
Fwdays
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
Fwdays
 
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation..."Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
Fwdays
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
Fwdays
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
Fwdays
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
Fwdays
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
Fwdays
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
Fwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
 

More from Fwdays (20)

"Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ..."Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
 
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation..."Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Recently uploaded

How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
ChristopherTHyatt
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 

Recently uploaded (20)

How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 

"What does it really mean for your system to be available, or how to define what to measure", Daniil Mazepin

  • 2. Agenda 02 Who is Daniil? Why SLOs? Terminology What to measure?
  • 3. 03 Who is Daniil? Software Engineering Manager / Head of Engineering with over 13 years of experience at companies of varying sizes and stages of maturity, ranging from small start-ups to Facebook. Experience spans multiple domains including fintech, social media, e-commerce, and gambling, utilising both top-down and bottom-up approaches.
  • 4. Terminology Reliability 04 The system or service performs in the expected way, when it’s required to do so.
  • 5. Terminology Service Level Indicator (SLI) 05 A quantifiable measure of service reliability.
  • 6. Terminology Service Level Objective (SLO) 06 A reliability target for an SLI.
  • 7. Terminology Service Level Agreement (SLA) 07 A contract (usually legally binding) between providers and customers of what happens if an SLO is not met.
  • 8. Terminology Error Budget 08 An SLO implies an acceptable level of unreliability. 100 - SLO = Error Budget for the next X days
  • 9. Terminology SLO & Error Budget Windows 09 Fixed Rolling Calendars - per week, per month, per quarter etc. Works well for internal reporting purposes. Crucial for planning reliability work. More closely aligned with the user experience because users’ trust does not magically recover on the first day of each month.
  • 10. Terminology Burn Rates 10 The rate at which the allowed number of errors is consumed. Error Budget for the next X days Burn Rate < 1 Burn Rate > 1
  • 11. Why SLOs? But why would we invest in defining and measuring SLOs? 11 To address the tension between the pace of innovation and service reliability.
  • 12. What to measure? A service level indicator (SLI): A metric of a specific aspect of your service. Duration: The window where SLI is measured. This can be calendar-based or a rolling window. A target: The value (or range of values) that the SLI should meet in the given duration in a healthy service. Choose your SLO 12
  • 13. What to measure? The metric directly relates to user happiness. The metric deterioration correlates with outages. The metric provides a good signal-to- noise ratio. The metric scales monotonically and linearly with customer happiness. Characteristics of a good metric 13
  • 14. What to measure? Request-driven services 14 Availability: The fraction of valid requests served successfully. Latency: The fraction of valid requests served faster than a threshold. Quality (*): The fraction of valid requests served without of degradation of service.
  • 15. What to measure? Availability?.. 15 Uptime?.. Availability still answers whether the system is up, but in more precise way, then measuring the time since the system was last down. Today services might be partially down, which is the factor which uptime doesn’t capture very well.
  • 16. What to measure? Data processing services 16 Coverage: The amount of data that has been processed, expressed as a fraction. For example, 95%. Correctness: The fraction of output data deemed to be correct. For example, 99.99%. Freshness: The freshness of the source data or aggregated output data, expressed as a fraction. Throughput: The fraction of time where data processing rate was faster than a threshold.
  • 17. What to measure? How many 9s do we need? 17
  • 18. What to measure? Why not 100%? 18 “100% is the wrong reliability target for basically everything.” Ben Treynor Sloss, founder of SRE at Google
  • 19. What to measure? Iterate! 19 “Picking the wrong number is better than picking no number.” from SRE.Google
  • 20. What to measure? Iterate! x2 20 Align dependencies. Build complex SLOs where it makes sense.
  • 21. Thank you! 21 Do you have any questions?