SlideShare a Scribd company logo
PROPRIETARY AND CONFIDENTIAL
Site Reliability Engineering
Michael Blakeney
"an SRE team is responsible for
the availability, latency,
performance, efficiency, change
management, monitoring,
emergency response, and capacity
planning of their service(s)."
What is SRE?
• Ensuring a Durable Focus on
Engineering
• Pursuing Maximum Change
Velocity Without Violating a
Service’s SLO
• Monitoring
• Emergency Response
• Change Management
• Demand Forecasting and
Capacity Planning
• Provisioning
• Efficiency and Performance
PROPRIETARY AND CONFIDENTIAL
Availability
Time Based Aggregate Based
3
"If you haven't tried it, assume it's broken"
Too binary for distributed systems that
can enter partial downtime or degraded states
Much broader and able to capture user facing
experience more effectively
Service Level Indicators
Service Level Objectives
Service Level Agreement
SLI, SLO, SLA
Database state should be 100% recovered in
no more than 1 day.
"99% of pipeline runs cover 100% of the
data."
90% ( averaged over 1 minute ) of http
requests to the backend should complete in
less than 10ms
4
https://landing.google.com/sre/workbook/chapters/slo-document/
PROPRIETARY AND CONFIDENTIAL
the time it takes for your
service to process a
request
Four Golden Signals
5
Latency
the measurement of the
requests the service is
handling
Traffic
the request rate of errors
Errors
How much a resource
with limited quantity is
utilized, usually
measured as a
Percentage of that
resource
Saturation
PROPRIETARY AND CONFIDENTIAL
Error Budgets
• Error budgets enable teams to make objective decisions regarding prioritization of
features versus reliability.
• Given an availability target the error budget defines the tolerable amount of service
unavailability. i.e. 99.99% availability => 0.01% unavailability or 12.96 minutes per
quarter
https://landing.google.com/sre/sre-book/chapters/availability-table/
https://landing.google.com/sre/workbook/chapters/error-budget-policy/
6
"Ways in which things go wrong are special cases of the ways in which things
go right"
PROPRIETARY AND CONFIDENTIAL
Being Agile with SLOs
• Transparency - the SLO and error budget policies along with all other
relevant material should be made available to the team and stake holders
• Inspection - the team should regularly review and analyze the effectiveness
and relevancy of the policies
• Adaptation - The team should be willing to adjust the policies so as to
maximize the value delivered to customers.
7
PROPRIETARY AND CONFIDENTIAL
References
8
https://landing.google.com/sre/books/

More Related Content

What's hot

Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
Ashutosh Agarwal
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
New Relic
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
ITSM Academy, Inc.
 
SRE 101
SRE 101SRE 101
SRE 101
Diego Pacheco
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
Setyo Legowo
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
Rauno De Pasquale
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
Hussain Mansoor
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
Dr Ganesh Iyer
 
A Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityA Crash Course in Building Site Reliability
A Crash Course in Building Site Reliability
Acquia
 
Service Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIService Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLI
Knoldus Inc.
 
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
Tori Wieldt
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
Franklin Angulo
 
What's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisWhat's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE Paris
Clément Michaud
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
Michael Kehoe
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
Dr Ganesh Iyer
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps.com
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
Grier Johnson
 
Setting SLOs and SLIs in the Real World
Setting SLOs and SLIs in the Real WorldSetting SLOs and SLIs in the Real World
Setting SLOs and SLIs in the Real World
New Relic
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
NUS-ISS
 
What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)
jeetendra mandal
 

What's hot (20)

Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
 
SRE 101
SRE 101SRE 101
SRE 101
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
A Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityA Crash Course in Building Site Reliability
A Crash Course in Building Site Reliability
 
Service Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIService Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLI
 
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
 
What's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisWhat's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE Paris
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
 
Setting SLOs and SLIs in the Real World
Setting SLOs and SLIs in the Real WorldSetting SLOs and SLIs in the Real World
Setting SLOs and SLIs in the Real World
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
 
What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)
 

Similar to Site reliability engineering - Lightning Talk

TaaS Webinar
TaaS WebinarTaaS Webinar
TaaS Webinar
Abhinav Das
 
3 Enterprise Storage Assessment
3 Enterprise Storage Assessment3 Enterprise Storage Assessment
3 Enterprise Storage Assessment
Jeremiah Loscalzo
 
Azure governance
Azure governanceAzure governance
Azure governance
girish goudar
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
Himanshu Sahu
 
Dimension of quality in Cloud Database Services
Dimension of quality in Cloud Database ServicesDimension of quality in Cloud Database Services
Dimension of quality in Cloud Database Services
Imran Khan
 
SFDC ODS Preludesys
SFDC ODS PreludesysSFDC ODS Preludesys
SFDC ODS Preludesys
Bhaskaran Thangaraj
 
Performance Testing Strategy for Cloud-Based System using Open Source Testing...
Performance Testing Strategy for Cloud-Based System using Open Source Testing...Performance Testing Strategy for Cloud-Based System using Open Source Testing...
Performance Testing Strategy for Cloud-Based System using Open Source Testing...
MIMOS Berhad/Open University Malaysia/Universiti Teknologi Malaysia
 
2 Storage Readiness Assessment
2 Storage Readiness Assessment2 Storage Readiness Assessment
2 Storage Readiness Assessment
Jeremiah Loscalzo
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
ssmarar
 
Types of performance testing
Types of performance testingTypes of performance testing
Types of performance testing
NaveenKumar Namachivayam
 
Rational Quality Manager
Rational Quality ManagerRational Quality Manager
Rational Quality Manager
Strongback Consulting
 
Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'
DevOpsDays DFW
 
Jagadeesh_Resume_5 + Years
Jagadeesh_Resume_5 + YearsJagadeesh_Resume_5 + Years
Jagadeesh_Resume_5 + Years
jagadeeshbabu rangu
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
Ricardo Amaro
 
typesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdftypesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdf
SRIRAMKIRAN9
 
Top 10 Business Reasons for ALM
Top 10 Business Reasons for ALMTop 10 Business Reasons for ALM
Top 10 Business Reasons for ALM
Imaginet
 
Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)
Imaginet
 
Welingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLikeWelingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLike
PrinceTrivedi4
 
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm,   integrated enterprise ppm-alm-apm on force.comCloudbyz ppm,   integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
Dinesh Sheshadri
 
Module -4 Resource Management.pdf
Module -4 Resource Management.pdfModule -4 Resource Management.pdf
Module -4 Resource Management.pdf
Sitamarhi Institute of Technology
 

Similar to Site reliability engineering - Lightning Talk (20)

TaaS Webinar
TaaS WebinarTaaS Webinar
TaaS Webinar
 
3 Enterprise Storage Assessment
3 Enterprise Storage Assessment3 Enterprise Storage Assessment
3 Enterprise Storage Assessment
 
Azure governance
Azure governanceAzure governance
Azure governance
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
 
Dimension of quality in Cloud Database Services
Dimension of quality in Cloud Database ServicesDimension of quality in Cloud Database Services
Dimension of quality in Cloud Database Services
 
SFDC ODS Preludesys
SFDC ODS PreludesysSFDC ODS Preludesys
SFDC ODS Preludesys
 
Performance Testing Strategy for Cloud-Based System using Open Source Testing...
Performance Testing Strategy for Cloud-Based System using Open Source Testing...Performance Testing Strategy for Cloud-Based System using Open Source Testing...
Performance Testing Strategy for Cloud-Based System using Open Source Testing...
 
2 Storage Readiness Assessment
2 Storage Readiness Assessment2 Storage Readiness Assessment
2 Storage Readiness Assessment
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
Types of performance testing
Types of performance testingTypes of performance testing
Types of performance testing
 
Rational Quality Manager
Rational Quality ManagerRational Quality Manager
Rational Quality Manager
 
Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'
 
Jagadeesh_Resume_5 + Years
Jagadeesh_Resume_5 + YearsJagadeesh_Resume_5 + Years
Jagadeesh_Resume_5 + Years
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
typesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdftypesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdf
 
Top 10 Business Reasons for ALM
Top 10 Business Reasons for ALMTop 10 Business Reasons for ALM
Top 10 Business Reasons for ALM
 
Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)
 
Welingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLikeWelingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLike
 
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm,   integrated enterprise ppm-alm-apm on force.comCloudbyz ppm,   integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
 
Module -4 Resource Management.pdf
Module -4 Resource Management.pdfModule -4 Resource Management.pdf
Module -4 Resource Management.pdf
 

Recently uploaded

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Site reliability engineering - Lightning Talk

  • 1. PROPRIETARY AND CONFIDENTIAL Site Reliability Engineering Michael Blakeney
  • 2. "an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)." What is SRE? • Ensuring a Durable Focus on Engineering • Pursuing Maximum Change Velocity Without Violating a Service’s SLO • Monitoring • Emergency Response • Change Management • Demand Forecasting and Capacity Planning • Provisioning • Efficiency and Performance
  • 3. PROPRIETARY AND CONFIDENTIAL Availability Time Based Aggregate Based 3 "If you haven't tried it, assume it's broken" Too binary for distributed systems that can enter partial downtime or degraded states Much broader and able to capture user facing experience more effectively
  • 4. Service Level Indicators Service Level Objectives Service Level Agreement SLI, SLO, SLA Database state should be 100% recovered in no more than 1 day. "99% of pipeline runs cover 100% of the data." 90% ( averaged over 1 minute ) of http requests to the backend should complete in less than 10ms 4 https://landing.google.com/sre/workbook/chapters/slo-document/
  • 5. PROPRIETARY AND CONFIDENTIAL the time it takes for your service to process a request Four Golden Signals 5 Latency the measurement of the requests the service is handling Traffic the request rate of errors Errors How much a resource with limited quantity is utilized, usually measured as a Percentage of that resource Saturation
  • 6. PROPRIETARY AND CONFIDENTIAL Error Budgets • Error budgets enable teams to make objective decisions regarding prioritization of features versus reliability. • Given an availability target the error budget defines the tolerable amount of service unavailability. i.e. 99.99% availability => 0.01% unavailability or 12.96 minutes per quarter https://landing.google.com/sre/sre-book/chapters/availability-table/ https://landing.google.com/sre/workbook/chapters/error-budget-policy/ 6 "Ways in which things go wrong are special cases of the ways in which things go right"
  • 7. PROPRIETARY AND CONFIDENTIAL Being Agile with SLOs • Transparency - the SLO and error budget policies along with all other relevant material should be made available to the team and stake holders • Inspection - the team should regularly review and analyze the effectiveness and relevancy of the policies • Adaptation - The team should be willing to adjust the policies so as to maximize the value delivered to customers. 7