SlideShare a Scribd company logo

OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) run away by Daniel Bodky

NETWAYS
NETWAYS

SL-something, error budgets, on-call shifts – SREs know it all, and many of us know SREs. But what’s the reality behind the job description first used at Google, and how do they operate? Are they glorified system administrators? DevOps folks gone platform engineers? Something entirely different? In this ignite, we will dip our toes in the waters of SRE, establishing a basic vocabulary and understanding, and take a look at how to (not) treat your SRE teams – because nobody likes a mutiny on their ships!

1 of 20
Download to read offline
Metrics, Margins, Mutiny
Wednesday, Nov 8 2023
How to make your SREs (not) run away
@d_bodky
About me
Consultant at NETWAYS since 2021
Working with technologies like
Kubernetes
Ansible
Prometheus
Grafana
Interested in DevOps and SRE practices
Site
In the beginnings, the site - google.com
nowadays, an arbitrary service
Maps
Mail
Adds
provided by an SRE team
consumed by users
Reliability
The ability of a service to perform as expected upon request
This is not the same as availability
A service can be available but not reliable
high latency
high error rates
Engineering
Members of SRE teams are…
⛔ classical system administrators
⛔ mainly systems engineers
✅ software engineers, first and foremost
“[At Google, ] Common to all SREs is the belief in
and aptitude for developing software systems to
solve complex problems”
Benjamin Treynor Sloss, Vice President, Google Engineering, founder of Google SRE

Recommended

DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsRauno De Pasquale
 
Predictability at Axial
Predictability at AxialPredictability at Axial
Predictability at AxialMatt Story
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
 
Lean and-kanban-final
Lean and-kanban-finalLean and-kanban-final
Lean and-kanban-finalAnh Huan Miu
 
Robert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert McGeachy
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsRicardo Amaro
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 

More Related Content

Similar to OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) run away by Daniel Bodky

SAD07 - Project Management
SAD07 - Project ManagementSAD07 - Project Management
SAD07 - Project ManagementMichael Heron
 
Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013lokori
 
Be Agile Rather Than Do Agile
Be Agile Rather Than Do AgileBe Agile Rather Than Do Agile
Be Agile Rather Than Do AgileBrenda Bao
 
Site-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfSite-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfDeepakGupta747774
 
Reducing Time Spent On Requirements
Reducing Time Spent On RequirementsReducing Time Spent On Requirements
Reducing Time Spent On RequirementsByron Workman
 
Agile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches SlidesAgile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches Slidesguesta1c5d7
 
YEG DPM Talk - January 16, 2017
YEG DPM Talk - January 16, 2017YEG DPM Talk - January 16, 2017
YEG DPM Talk - January 16, 2017Kayla Baretta
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Quiz 9
Quiz 9Quiz 9
Quiz 9jiml59
 
Feedback loops - the second way towards the world of DevOps
Feedback loops - the second way towards the world of DevOpsFeedback loops - the second way towards the world of DevOps
Feedback loops - the second way towards the world of DevOpsTapio Rautonen
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops finalGene Kim
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous DeploymentBrian Henerey
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Tri State Final
Tri State FinalTri State Final
Tri State FinalSamWagner
 
A Pattern-Language-for-software-Development
A Pattern-Language-for-software-DevelopmentA Pattern-Language-for-software-Development
A Pattern-Language-for-software-DevelopmentShiraz316
 
DevOps - Understanding Core Concepts
DevOps - Understanding Core ConceptsDevOps - Understanding Core Concepts
DevOps - Understanding Core ConceptsNitin Bhide
 
Introduction to Lean Software Development
Introduction to Lean Software DevelopmentIntroduction to Lean Software Development
Introduction to Lean Software DevelopmentMichael Vax
 
Best Practices When Moving To Agile Project Management
Best Practices When Moving To Agile Project ManagementBest Practices When Moving To Agile Project Management
Best Practices When Moving To Agile Project ManagementRobert McGeachy
 
Agile and Scrum Workshop
Agile and Scrum WorkshopAgile and Scrum Workshop
Agile and Scrum WorkshopRainer Stropek
 

Similar to OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) run away by Daniel Bodky (20)

SRE-cheat-sheet.docx
SRE-cheat-sheet.docxSRE-cheat-sheet.docx
SRE-cheat-sheet.docx
 
SAD07 - Project Management
SAD07 - Project ManagementSAD07 - Project Management
SAD07 - Project Management
 
Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013
 
Be Agile Rather Than Do Agile
Be Agile Rather Than Do AgileBe Agile Rather Than Do Agile
Be Agile Rather Than Do Agile
 
Site-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfSite-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdf
 
Reducing Time Spent On Requirements
Reducing Time Spent On RequirementsReducing Time Spent On Requirements
Reducing Time Spent On Requirements
 
Agile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches SlidesAgile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches Slides
 
YEG DPM Talk - January 16, 2017
YEG DPM Talk - January 16, 2017YEG DPM Talk - January 16, 2017
YEG DPM Talk - January 16, 2017
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Quiz 9
Quiz 9Quiz 9
Quiz 9
 
Feedback loops - the second way towards the world of DevOps
Feedback loops - the second way towards the world of DevOpsFeedback loops - the second way towards the world of DevOps
Feedback loops - the second way towards the world of DevOps
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops final
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Tri State Final
Tri State FinalTri State Final
Tri State Final
 
A Pattern-Language-for-software-Development
A Pattern-Language-for-software-DevelopmentA Pattern-Language-for-software-Development
A Pattern-Language-for-software-Development
 
DevOps - Understanding Core Concepts
DevOps - Understanding Core ConceptsDevOps - Understanding Core Concepts
DevOps - Understanding Core Concepts
 
Introduction to Lean Software Development
Introduction to Lean Software DevelopmentIntroduction to Lean Software Development
Introduction to Lean Software Development
 
Best Practices When Moving To Agile Project Management
Best Practices When Moving To Agile Project ManagementBest Practices When Moving To Agile Project Management
Best Practices When Moving To Agile Project Management
 
Agile and Scrum Workshop
Agile and Scrum WorkshopAgile and Scrum Workshop
Agile and Scrum Workshop
 

Recently uploaded

Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdfInstructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdfaytenfsuc
 
God and You 2 Cor 5:15-19; February 25, 2024
God and You 2 Cor 5:15-19; February 25, 2024God and You 2 Cor 5:15-19; February 25, 2024
God and You 2 Cor 5:15-19; February 25, 2024Central Church of Christ
 
Monthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSEMonthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSEOlgaOliveaJohn
 
Space expansion: cultural considerations, long term perspectives, and spiritu...
Space expansion: cultural considerations, long term perspectives, and spiritu...Space expansion: cultural considerations, long term perspectives, and spiritu...
Space expansion: cultural considerations, long term perspectives, and spiritu...Giulio Prisco
 
Present and Future Requisites for Prosperity in the Caribbean
Present and Future Requisites for Prosperity in the CaribbeanPresent and Future Requisites for Prosperity in the Caribbean
Present and Future Requisites for Prosperity in the CaribbeanCaribbean Development Bank
 
Chapter 20 Firms in IGCSE economics presentation
Chapter 20  Firms in IGCSE  economics presentationChapter 20  Firms in IGCSE  economics presentation
Chapter 20 Firms in IGCSE economics presentationSamandarbekNumonov
 
KKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program ConceptKKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program ConceptKarthik Krishna
 
Supporting Resilient Prosperity in the Caribbean
Supporting Resilient Prosperity in the CaribbeanSupporting Resilient Prosperity in the Caribbean
Supporting Resilient Prosperity in the CaribbeanCaribbean Development Bank
 
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptxTeams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptxKai Stenberg
 
Partnerships for Resilient Prosperity in the Caribbean
Partnerships for Resilient Prosperity in the CaribbeanPartnerships for Resilient Prosperity in the Caribbean
Partnerships for Resilient Prosperity in the CaribbeanCaribbean Development Bank
 

Recently uploaded (11)

Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdfInstructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
 
God and You 2 Cor 5:15-19; February 25, 2024
God and You 2 Cor 5:15-19; February 25, 2024God and You 2 Cor 5:15-19; February 25, 2024
God and You 2 Cor 5:15-19; February 25, 2024
 
Monthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSEMonthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSE
 
Space expansion: cultural considerations, long term perspectives, and spiritu...
Space expansion: cultural considerations, long term perspectives, and spiritu...Space expansion: cultural considerations, long term perspectives, and spiritu...
Space expansion: cultural considerations, long term perspectives, and spiritu...
 
Present and Future Requisites for Prosperity in the Caribbean
Present and Future Requisites for Prosperity in the CaribbeanPresent and Future Requisites for Prosperity in the Caribbean
Present and Future Requisites for Prosperity in the Caribbean
 
Chapter 20 Firms in IGCSE economics presentation
Chapter 20  Firms in IGCSE  economics presentationChapter 20  Firms in IGCSE  economics presentation
Chapter 20 Firms in IGCSE economics presentation
 
KKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program ConceptKKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program Concept
 
Supporting Resilient Prosperity in the Caribbean
Supporting Resilient Prosperity in the CaribbeanSupporting Resilient Prosperity in the Caribbean
Supporting Resilient Prosperity in the Caribbean
 
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptxTeams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptx
 
Auditorium Session 1 - Connection - Inclusion
Auditorium Session 1 - Connection - InclusionAuditorium Session 1 - Connection - Inclusion
Auditorium Session 1 - Connection - Inclusion
 
Partnerships for Resilient Prosperity in the Caribbean
Partnerships for Resilient Prosperity in the CaribbeanPartnerships for Resilient Prosperity in the Caribbean
Partnerships for Resilient Prosperity in the Caribbean
 

OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) run away by Daniel Bodky

  • 1. Metrics, Margins, Mutiny Wednesday, Nov 8 2023 How to make your SREs (not) run away @d_bodky
  • 2. About me Consultant at NETWAYS since 2021 Working with technologies like Kubernetes Ansible Prometheus Grafana Interested in DevOps and SRE practices
  • 3. Site In the beginnings, the site - google.com nowadays, an arbitrary service Maps Mail Adds provided by an SRE team consumed by users
  • 4. Reliability The ability of a service to perform as expected upon request This is not the same as availability A service can be available but not reliable high latency high error rates
  • 5. Engineering Members of SRE teams are… ⛔ classical system administrators ⛔ mainly systems engineers ✅ software engineers, first and foremost
  • 6. “[At Google, ] Common to all SREs is the belief in and aptitude for developing software systems to solve complex problems” Benjamin Treynor Sloss, Vice President, Google Engineering, founder of Google SRE
  • 8. SLIs, SLOs, and Error Budgets
  • 9. SLIs, SLOs, and Error Budgets SLIs (Service Level Indicators) are key metrics for your service(s) SLOs (Service Level Objectives) are targets for your SLIs Error Budgets are the difference between your SLOs and your actual SLIs Photo by Emil Kalibradov on Unsplash
  • 10. Metrics Collection Looking at all metrics all the time is not feasible Focus on metrics that matter for your end users can be used to forge meaningful SLIs KISS - Keep It Short and Simple! Photo by Tim Mossholder on Unsplash
  • 11. SLI Generalization Generalize SLIs: Choose sane defaults for aggregation: intervals (e.g. 5 minutes) methods (e.g. average) regions (e.g. cluster-wide) resolutions (e.g. 10 second) DRY - Don’t Repeat Yourself! Photo by Stephen Phillips on Unsplash
  • 13. Identifying Toil Recurring, boring tasks that don’t lead to long- term benefits: 🗑️ Manually restarting services 🗑️ Manually executing scripts ♻️ Handling pager alerts (for the first time) ♻️ Refactoring code to reduce technical debt Photo by the blowup on Unsplash
  • 14. Managing Toil Distribute it evenly across the team Do it immediately Chip away at it, week by week Photo by Luis Villasmil on Unsplash
  • 15. Minimizing Toil Aim for automatic, not automated systems and solutions needed maintenance scaling < O(n) a shared mindset that some toil is inevitable, but too much is unacceptable Photo by Gary Chan on Unsplash
  • 17. On-Call is Toil On-Call time is a natural lower bound to the amount of toil that can’t be reduced. Be careful with introducing/allowing additional toil.
  • 18. Balance is Key At Google, two incidents per shift are seen as a good balance, leaving enough time for proper handling and postmortems. More incidents, and handling becomes hasty Less incidents, and the on-call engineer’s time gets wasted Photo by Alexander Andrews on Unsplash
  • 19. Keep in mind Let your SREs engineer, not just operate Stay on top of our SLIs, SLOs, Error Budgets, and Toil Act accordingly Staff your SRE team(s) appropriately Photo by Diego PH on Unsplash
  • 20. There’s so much more Release Engineering Engineering for Automation Incident Management … and much more. Maybe another time! Photo by Hadija on Unsplash