1. SRE is the discipline of applying software engineering practices to solve operations problems to build reliable systems.
2. Service level terminology includes Service Level Indicators (SLIs) which are quantitative measures of service aspects like latency or error rates, Service Level Objectives (SLOs) which are goals for specific metrics, and Service Level Agreements (SLAs) which are agreements within an SLA.
3. Choosing the right SLIs, crafting meaningful SLOs, collecting indicator data, and meeting customer expectations through SLAs are important for building reliable services.
SRE : ServiceLevel
Terminology
Presented By:
● Mukesh Yadav
● Sakshi Gawande
Software Consultant
Knoldus Inc.
LEARN NOW
2.
About Knoldus
Knoldus isa technology consulting firm with focus on modernizing the digital systems
at the pace your business demands.
DevOps
Functional. Reactive. Cloud Native
3.
Introduction SRE ServiceOutage Collecting
Indicators
Practice best
Indicators
Service Level
Terminology
Our Agenda
4.
Site Reliability
Engineer
SRE isdiscipline that
happens when a software
engineer is put to solve
operations problems
LEARN NOW
Load of server
SRE
Code System
Service Level Indicator
●An SLI (service level indicator) a carefully defined
quantitative measure of some aspect of the level
of service that is provided. e.g (latency, error rate)
User User
●
Expressed As Good Events
-----------------
Bad Events
●
User
8.
Service Level Objective
●SLO (Service Level Objective ) is a goal that service provider wants to reach.
● An SLO is an agreement within an SLA about a specific metric like uptime or response
time.
● SLOs are the individual promises you’re making to that customer.
● SLOs are what goals developers need to hit and measure themselves against.
● Phases:
Design Build Review
Best Practices forSLI,SLO,SLA
Craft SLAs around customer expectations
Build in an error budget
Not every trackable metric should be an SLI
Use plain language in SLAs
With SLOs, less is more
Include factors outside the IT team’s control
11.
Who Define ServiceLevel Terminology
SRE SRE SRE
Product Owner
Product Owner
User
Sales
SLI SLO SLA
12.
Service Outage
A bankingservice with proper monitoring solution
Problem :
Time → service outages
Inc. Frustration in Customer
Solution :
Setup (SLA,SLO) for service
Effect :
Service owner able to track service available
13.
Indicators in Practice
Whichindicator to choose?
Choosing too many indicators makes it hard to pay attention
What Do You and Your Users Care About?
Service is free or paid
what are service specific indicators ?
● User-facing --->availability, latency, and throughput
● Storage ---> latency, availability and durability.
● Big data systems -----> throughput and end-to-end latency
14.
Collecting Indicators
Where indicatorcan be collected?
● Server side,Ex. Borgmon, Prometheus.
● Client-side collection
How can we collect metrics?
● Raw measurements
● Aggregation
○ 200 requests/s in even-numbered instantaneous
○ constant 100 requests/s
15.
Conclusion
1. Meet customershigh standard expectations
2. Services are reliable or not.
3. SLIs, SLOs and SLAs Should be part of reliable system.
4. Good relationship with customers.