SlideShare a Scribd company logo
Proprietary & Confidential
Reducing Trauma in
Organizations with SLOs and
Chaos Engineering
Mandi Walls
DevOps Advocate
PagerDuty
@Lnxchk
Julie Gunderson
Sr. Reliability Advocate
Gremlin
@julie_gund
Introduction
Centering user experience is key
to prioritizing what will best serve
the users and increase
engagement.
We do this with qualitative practices
like Full Service Ownership and
quantitative practices like SLIs and
SLOs.
Measuring the Cost of Downtime
Cost = R + E + C + ( B + A )
During the Outage
R = Revenue Lost
E = Employee Productivity
After the Outage
C = Customer Chargebacks
(SLA Breaches)
Unquantifiable
B = Brand Defamation
A = Employee Attrition
Amazon is estimated to lose $13.22MM/hour or $220,000/min
Your company? Average is $300,000/hour
What are SLAs, SLIs,
and SLOs?
SLIs and SLOs
Indicators - our metrics
Objectives - our goals for those metrics
Error Budgets
Product
Development
Capacity Planning
Testing & Release Procedures
Post-incident Analysis
Incident Response
Monitoring & Observability E.g. SLOs & SLIs
E.g. Blam
eless Postm
ortem
s
E.g. Canary Deploym
ents
E.g. Error Budgets
Centering the User
Experience
Proprietary & Confidential
The Users are the Point
What is important to the users?
How do you know?
Is it different for different parts of
the application?
Focus on What Users Care About
News Stream
Loads Fast
on Scroll
What else?
No Missing
Images
Center
Module
Loads First
No Errors on
Main Page
Fast Load
Time
Translate User Experience to Useful Metrics
Photo by Luke Chesser on Unsplash
How do we know if our
SLO/SLIs are working as
expected? ?
We inject failure
proactively to validate
SLOs/SLIs. ✓
Prerequisites for
setting SLIs and SLOs
Telemetry
• Monitoring - keep track of what you know
• Logging - scan for errors after the fact
• Tracing - follow the user through the service ecosystem
• Observability - the baseline characteristic
Photo by Mikail McVerry on Unsplash
Dependency Mapping
Success of your SLOs
depends on the SLOs of your
service upstream
dependencies
Creating SLIs, SLOs,
and Error Budgets
[SLI][SLO][t]
SLI = (good/valid) * 100
eb = 100 - SLI
Example
Valid Events Good
Events
SLO Error
Budget
Allowable
Bad Events
100 99 99% 1% 1
1000 999 99.9% 0.1% 1
1000 990 99% 1% 10
10,000 9999 99.99% 0.01% 1
100,000 99,000 99% 1% 1000
100,000 99,999 99.999% 0.001% 1
Perfect: 100% of web requests have 0ms latency all the time!
30
Perfect: 100% of web requests have 0ms latency all the time!
SLA: 90% of web requests have latency <500ms for the
month… or customer gets money back.
31
Perfect: 100% of web requests have 0ms latency all the time!
SLA: 90% of web requests have latency <500ms for the
month… or customer gets money back.
SLO: 95% of web requests have latency <500ms over a rolling
month.
32
Perfect: 100% of web requests have 0ms latency all the time!
SLA: 90% of web requests have latency <500ms for the
month… or customer gets money back.
SLO: 95% of web requests have latency <500ms over a rolling
month.
SLI: web requests latency <500ms
33
Instance
Downtime
Occurs
Datadog picks up that
instance is down for SLI
calculation (metric), auto
tracks SLO is now
impacted (monitor)
PagerDuty fires an alert
that the uptime SLO has
been breached
Gremlin
Downtime
SLO Scenario
is run on
staging
Revising and Revisiting
Photo by Jonathan Kemper on Unsplash
These are internal tools!
You can change them if they
no longer work for you!
Use Chaos
Engineering to test
out new features
and focus on your
SLOs
40
Development Staging Production
Working with Upstream Dependencies
Do your dependencies publish their own SLOs?
Can you defensively code around bad performance?
Do you need to explore alternatives?
Use Chaos
Engineering to
validate your
dependencies
?
Unplanned Work
Incidents can indicate that work needs to be done
Your SLOs and error budgets are part of your postmortem discussion
Revisit and prioritize work based on the outcomes of a major incident
Lifecycle
• Research user behavior
• Measure and monitor for reliability
and performance
• Set goals, write SLIs, establish
SLOs
• Work to keep SLOs in the green
• Verify SLOs and error budgets in
incident post mortems
• Adjust SLOs to new business
requirements
Summary
SLIs prioritize the User Experience
SLOs quantify “good” vs “bad” experience to a quantitative goal
Error Budgets tell your team where you stand
They all feed back into the work prioritization process
You can change them when they no longer work for you
Resources
Talks at SLOConf: https://www.sloconf.com/talks
Google’s SRE Books are available online: https://sre.google/books/
Implementing SLOs:
https://www.oreilly.com/library/view/implementing-service-level/9781492076803/
Gremlin Free: https://www.gremlin.com/buttons/
Sign up for a PagerDuty trial at https://pagerduty.com/sign-up
Gremlin Certified Chaos Engineering Professional certification:
https://www.gremlin.com/certification
Proprietary & Confidential
Julie’s random
slides
Moving to the cloud
Verify host failure, autoscaling rules, and memory.
Migrating to microservices
Validate that each new service can fail independently.
Protect against cascading failures and knock-on effects.
Adopting Kubernetes
The devil is the in details. Have you configured everything correctly?
Are you running one large cluster?
Find your monitoring gaps, reduce signal to noise
“We’ll get paged if that breaks”, until you don’t.
A false sense of security is worse than nothing.
Train your teams
We run fire drills, train firefights, and first responders.
Are you investing in your operations teams?
55

More Related Content

What's hot

Nga Flyer
Nga FlyerNga Flyer
Nga Flyer
lauren_faughnan
 
When do you need it by? Business Agility Metrics
When do you need it by? Business Agility MetricsWhen do you need it by? Business Agility Metrics
When do you need it by? Business Agility Metrics
Martin Aziz
 
Shipped - It's Time to Kanbanize Your System
Shipped - It's Time to Kanbanize Your SystemShipped - It's Time to Kanbanize Your System
Shipped - It's Time to Kanbanize Your System
Rajesh Viswanathan
 
Clearworks Customer Experience and Usability
Clearworks Customer Experience and UsabilityClearworks Customer Experience and Usability
Clearworks Customer Experience and Usability
Clearworks
 
CSSAGlobal Profile
CSSAGlobal ProfileCSSAGlobal Profile
CSSAGlobal Profile
CSSAGlobal Infomedia Pvt. Ltd
 
Nirmalya Sarkar-Resume.
Nirmalya Sarkar-Resume.Nirmalya Sarkar-Resume.
Nirmalya Sarkar-Resume.
Nirmalya Sarkar
 
SharePoint Conference Recap - Project Server
SharePoint Conference Recap - Project Server SharePoint Conference Recap - Project Server
SharePoint Conference Recap - Project Server
Knowledge Management Associates, LLC
 
Madhavi Sawant_Testing RPA
Madhavi Sawant_Testing RPAMadhavi Sawant_Testing RPA
Madhavi Sawant_Testing RPA
Madhavi Chalke
 
QA Mentor Brochure
QA Mentor BrochureQA Mentor Brochure
QA Mentor Brochure
Ruslan Desyatnikov
 
Project Server and SharePoint Server - better together
Project Server and SharePoint Server - better togetherProject Server and SharePoint Server - better together
Project Server and SharePoint Server - better together
Adis Jugo
 
Lean and Agile SAP
Lean and Agile SAPLean and Agile SAP
Lean and Agile SAP
Jason Fair
 
CIO Review QA Mentor
CIO Review QA MentorCIO Review QA Mentor
CIO Review QA Mentor
Ruslan Desyatnikov
 
Company Software Design Proposal Powerpoint Presentation
Company Software Design Proposal Powerpoint PresentationCompany Software Design Proposal Powerpoint Presentation
Company Software Design Proposal Powerpoint Presentation
SlideTeam
 
Net Solutions Engagement Models Brochure
Net Solutions Engagement Models BrochureNet Solutions Engagement Models Brochure
Net Solutions Engagement Models Brochure
Net Solutions
 
Ventas Final Eng Agosto 2010
Ventas Final Eng Agosto 2010Ventas Final Eng Agosto 2010
Ventas Final Eng Agosto 2010
ricardofarias8
 
Adaptive business analysis skill enhancement program v6.0 slideshare
Adaptive business analysis skill enhancement program v6.0 slideshareAdaptive business analysis skill enhancement program v6.0 slideshare
Adaptive business analysis skill enhancement program v6.0 slideshare
Ananya Pani
 
Arcane_Profile_2016
Arcane_Profile_2016Arcane_Profile_2016
Arcane_Profile_2016
Anoop S Prasanna
 
SandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_ConsultantSandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_Consultant
Sandeep Kola
 
GAURAV RAGHUPATI ANVEKAR.docx
GAURAV RAGHUPATI ANVEKAR.docxGAURAV RAGHUPATI ANVEKAR.docx
GAURAV RAGHUPATI ANVEKAR.docx
Gaurav Anvekar
 
Software_Development_Master_Document
Software_Development_Master_DocumentSoftware_Development_Master_Document
Software_Development_Master_Document
AKSHAY ASSOCIATE
 

What's hot (20)

Nga Flyer
Nga FlyerNga Flyer
Nga Flyer
 
When do you need it by? Business Agility Metrics
When do you need it by? Business Agility MetricsWhen do you need it by? Business Agility Metrics
When do you need it by? Business Agility Metrics
 
Shipped - It's Time to Kanbanize Your System
Shipped - It's Time to Kanbanize Your SystemShipped - It's Time to Kanbanize Your System
Shipped - It's Time to Kanbanize Your System
 
Clearworks Customer Experience and Usability
Clearworks Customer Experience and UsabilityClearworks Customer Experience and Usability
Clearworks Customer Experience and Usability
 
CSSAGlobal Profile
CSSAGlobal ProfileCSSAGlobal Profile
CSSAGlobal Profile
 
Nirmalya Sarkar-Resume.
Nirmalya Sarkar-Resume.Nirmalya Sarkar-Resume.
Nirmalya Sarkar-Resume.
 
SharePoint Conference Recap - Project Server
SharePoint Conference Recap - Project Server SharePoint Conference Recap - Project Server
SharePoint Conference Recap - Project Server
 
Madhavi Sawant_Testing RPA
Madhavi Sawant_Testing RPAMadhavi Sawant_Testing RPA
Madhavi Sawant_Testing RPA
 
QA Mentor Brochure
QA Mentor BrochureQA Mentor Brochure
QA Mentor Brochure
 
Project Server and SharePoint Server - better together
Project Server and SharePoint Server - better togetherProject Server and SharePoint Server - better together
Project Server and SharePoint Server - better together
 
Lean and Agile SAP
Lean and Agile SAPLean and Agile SAP
Lean and Agile SAP
 
CIO Review QA Mentor
CIO Review QA MentorCIO Review QA Mentor
CIO Review QA Mentor
 
Company Software Design Proposal Powerpoint Presentation
Company Software Design Proposal Powerpoint PresentationCompany Software Design Proposal Powerpoint Presentation
Company Software Design Proposal Powerpoint Presentation
 
Net Solutions Engagement Models Brochure
Net Solutions Engagement Models BrochureNet Solutions Engagement Models Brochure
Net Solutions Engagement Models Brochure
 
Ventas Final Eng Agosto 2010
Ventas Final Eng Agosto 2010Ventas Final Eng Agosto 2010
Ventas Final Eng Agosto 2010
 
Adaptive business analysis skill enhancement program v6.0 slideshare
Adaptive business analysis skill enhancement program v6.0 slideshareAdaptive business analysis skill enhancement program v6.0 slideshare
Adaptive business analysis skill enhancement program v6.0 slideshare
 
Arcane_Profile_2016
Arcane_Profile_2016Arcane_Profile_2016
Arcane_Profile_2016
 
SandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_ConsultantSandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_Consultant
 
GAURAV RAGHUPATI ANVEKAR.docx
GAURAV RAGHUPATI ANVEKAR.docxGAURAV RAGHUPATI ANVEKAR.docx
GAURAV RAGHUPATI ANVEKAR.docx
 
Software_Development_Master_Document
Software_Development_Master_DocumentSoftware_Development_Master_Document
Software_Development_Master_Document
 

Similar to Addo reducing trauma in organizations with SLOs and chaos engineering

Nobl9 Webinar SLOs at scale .pdf
Nobl9 Webinar SLOs at scale .pdfNobl9 Webinar SLOs at scale .pdf
Nobl9 Webinar SLOs at scale .pdf
ErzaZylfijaj
 
Beyond the Buzzwords
Beyond the BuzzwordsBeyond the Buzzwords
Beyond the Buzzwords
Sean Keery
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
Ricardo Amaro
 
Seven ways to ruin an sla discussion
Seven ways to ruin an sla discussionSeven ways to ruin an sla discussion
Seven ways to ruin an sla discussion
Alejandro Alemany
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRE
Tzung-Hsien (Shawn) Ho
 
2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction
Abigail Bangser
 
Keynote Tech Talks: Watching SaaS Apps with Keynote
Keynote Tech Talks: Watching SaaS Apps with KeynoteKeynote Tech Talks: Watching SaaS Apps with Keynote
Keynote Tech Talks: Watching SaaS Apps with Keynote
Keynote Mobile Testing
 
Governance Model PowerPoint Presentation Slides
Governance Model PowerPoint Presentation SlidesGovernance Model PowerPoint Presentation Slides
Governance Model PowerPoint Presentation Slides
SlideTeam
 
Building a scalable and profitable saa s business model
Building a scalable and profitable saa s business modelBuilding a scalable and profitable saa s business model
Building a scalable and profitable saa s business model
kanimozhin
 
Slcm webinar
Slcm webinarSlcm webinar
Slcm webinar
kanimozhin
 
Managed Services Using SLAs and KPIs
Managed Services Using SLAs and KPIsManaged Services Using SLAs and KPIs
Managed Services Using SLAs and KPIs
Prolifics
 
Governance Model Powerpoint Presentation Slides
Governance Model Powerpoint Presentation SlidesGovernance Model Powerpoint Presentation Slides
Governance Model Powerpoint Presentation Slides
SlideTeam
 
Service Level Management PowerPoint Presentation Slides
Service Level Management PowerPoint Presentation SlidesService Level Management PowerPoint Presentation Slides
Service Level Management PowerPoint Presentation Slides
SlideTeam
 
Apollo Service Desk Capabilities
Apollo Service Desk CapabilitiesApollo Service Desk Capabilities
Apollo Service Desk Capabilities
jdivalerio
 
PECB Webinar: Achieve business excellence through the power of Six Sigma
PECB Webinar: Achieve business excellence through the power of Six SigmaPECB Webinar: Achieve business excellence through the power of Six Sigma
PECB Webinar: Achieve business excellence through the power of Six Sigma
PECB
 
Project Workforce Management Powerpoint Presentation Slides
Project Workforce Management Powerpoint Presentation SlidesProject Workforce Management Powerpoint Presentation Slides
Project Workforce Management Powerpoint Presentation Slides
SlideTeam
 
Project Workforce Management PowerPoint Presentation Slides
Project Workforce Management PowerPoint Presentation SlidesProject Workforce Management PowerPoint Presentation Slides
Project Workforce Management PowerPoint Presentation Slides
SlideTeam
 
Example Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation SlidesExample Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation Slides
SlideTeam
 
The Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan LockardThe Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan Lockard
agilemaine
 
Agile Pmi 102108 Final
Agile Pmi 102108 FinalAgile Pmi 102108 Final
Agile Pmi 102108 Final
bmcglin
 

Similar to Addo reducing trauma in organizations with SLOs and chaos engineering (20)

Nobl9 Webinar SLOs at scale .pdf
Nobl9 Webinar SLOs at scale .pdfNobl9 Webinar SLOs at scale .pdf
Nobl9 Webinar SLOs at scale .pdf
 
Beyond the Buzzwords
Beyond the BuzzwordsBeyond the Buzzwords
Beyond the Buzzwords
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Seven ways to ruin an sla discussion
Seven ways to ruin an sla discussionSeven ways to ruin an sla discussion
Seven ways to ruin an sla discussion
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRE
 
2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction
 
Keynote Tech Talks: Watching SaaS Apps with Keynote
Keynote Tech Talks: Watching SaaS Apps with KeynoteKeynote Tech Talks: Watching SaaS Apps with Keynote
Keynote Tech Talks: Watching SaaS Apps with Keynote
 
Governance Model PowerPoint Presentation Slides
Governance Model PowerPoint Presentation SlidesGovernance Model PowerPoint Presentation Slides
Governance Model PowerPoint Presentation Slides
 
Building a scalable and profitable saa s business model
Building a scalable and profitable saa s business modelBuilding a scalable and profitable saa s business model
Building a scalable and profitable saa s business model
 
Slcm webinar
Slcm webinarSlcm webinar
Slcm webinar
 
Managed Services Using SLAs and KPIs
Managed Services Using SLAs and KPIsManaged Services Using SLAs and KPIs
Managed Services Using SLAs and KPIs
 
Governance Model Powerpoint Presentation Slides
Governance Model Powerpoint Presentation SlidesGovernance Model Powerpoint Presentation Slides
Governance Model Powerpoint Presentation Slides
 
Service Level Management PowerPoint Presentation Slides
Service Level Management PowerPoint Presentation SlidesService Level Management PowerPoint Presentation Slides
Service Level Management PowerPoint Presentation Slides
 
Apollo Service Desk Capabilities
Apollo Service Desk CapabilitiesApollo Service Desk Capabilities
Apollo Service Desk Capabilities
 
PECB Webinar: Achieve business excellence through the power of Six Sigma
PECB Webinar: Achieve business excellence through the power of Six SigmaPECB Webinar: Achieve business excellence through the power of Six Sigma
PECB Webinar: Achieve business excellence through the power of Six Sigma
 
Project Workforce Management Powerpoint Presentation Slides
Project Workforce Management Powerpoint Presentation SlidesProject Workforce Management Powerpoint Presentation Slides
Project Workforce Management Powerpoint Presentation Slides
 
Project Workforce Management PowerPoint Presentation Slides
Project Workforce Management PowerPoint Presentation SlidesProject Workforce Management PowerPoint Presentation Slides
Project Workforce Management PowerPoint Presentation Slides
 
Example Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation SlidesExample Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation Slides
 
The Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan LockardThe Evolution of the Enterprise Operating Model - Ryan Lockard
The Evolution of the Enterprise Operating Model - Ryan Lockard
 
Agile Pmi 102108 Final
Agile Pmi 102108 FinalAgile Pmi 102108 Final
Agile Pmi 102108 Final
 

More from Mandi Walls

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
Mandi Walls
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
Mandi Walls
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
Mandi Walls
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
Mandi Walls
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
Mandi Walls
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
Mandi Walls
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
Mandi Walls
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Mandi Walls
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
Mandi Walls
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
Mandi Walls
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
Mandi Walls
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
Mandi Walls
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
Mandi Walls
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
Mandi Walls
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
Mandi Walls
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
Mandi Walls
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
Mandi Walls
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
Mandi Walls
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
Mandi Walls
 
Habitat at SRECon
Habitat at SREConHabitat at SRECon
Habitat at SRECon
Mandi Walls
 

More from Mandi Walls (20)

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
 
Habitat at SRECon
Habitat at SREConHabitat at SRECon
Habitat at SRECon
 

Recently uploaded

YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
NishanthaBulumulla1
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 

Recently uploaded (20)

YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 

Addo reducing trauma in organizations with SLOs and chaos engineering

  • 1. Proprietary & Confidential Reducing Trauma in Organizations with SLOs and Chaos Engineering Mandi Walls DevOps Advocate PagerDuty @Lnxchk Julie Gunderson Sr. Reliability Advocate Gremlin @julie_gund
  • 2. Introduction Centering user experience is key to prioritizing what will best serve the users and increase engagement. We do this with qualitative practices like Full Service Ownership and quantitative practices like SLIs and SLOs.
  • 3. Measuring the Cost of Downtime Cost = R + E + C + ( B + A ) During the Outage R = Revenue Lost E = Employee Productivity After the Outage C = Customer Chargebacks (SLA Breaches) Unquantifiable B = Brand Defamation A = Employee Attrition Amazon is estimated to lose $13.22MM/hour or $220,000/min Your company? Average is $300,000/hour
  • 4. What are SLAs, SLIs, and SLOs?
  • 5. SLIs and SLOs Indicators - our metrics Objectives - our goals for those metrics
  • 7. Product Development Capacity Planning Testing & Release Procedures Post-incident Analysis Incident Response Monitoring & Observability E.g. SLOs & SLIs E.g. Blam eless Postm ortem s E.g. Canary Deploym ents E.g. Error Budgets
  • 9. Proprietary & Confidential The Users are the Point What is important to the users? How do you know? Is it different for different parts of the application?
  • 10. Focus on What Users Care About News Stream Loads Fast on Scroll What else? No Missing Images Center Module Loads First No Errors on Main Page Fast Load Time
  • 11. Translate User Experience to Useful Metrics Photo by Luke Chesser on Unsplash
  • 12. How do we know if our SLO/SLIs are working as expected? ?
  • 13. We inject failure proactively to validate SLOs/SLIs. ✓
  • 15. Telemetry • Monitoring - keep track of what you know • Logging - scan for errors after the fact • Tracing - follow the user through the service ecosystem • Observability - the baseline characteristic Photo by Mikail McVerry on Unsplash
  • 16. Dependency Mapping Success of your SLOs depends on the SLOs of your service upstream dependencies
  • 17. Creating SLIs, SLOs, and Error Budgets
  • 18. [SLI][SLO][t] SLI = (good/valid) * 100 eb = 100 - SLI
  • 19. Example Valid Events Good Events SLO Error Budget Allowable Bad Events 100 99 99% 1% 1 1000 999 99.9% 0.1% 1 1000 990 99% 1% 10 10,000 9999 99.99% 0.01% 1 100,000 99,000 99% 1% 1000 100,000 99,999 99.999% 0.001% 1
  • 20. Perfect: 100% of web requests have 0ms latency all the time! 30
  • 21. Perfect: 100% of web requests have 0ms latency all the time! SLA: 90% of web requests have latency <500ms for the month… or customer gets money back. 31
  • 22. Perfect: 100% of web requests have 0ms latency all the time! SLA: 90% of web requests have latency <500ms for the month… or customer gets money back. SLO: 95% of web requests have latency <500ms over a rolling month. 32
  • 23. Perfect: 100% of web requests have 0ms latency all the time! SLA: 90% of web requests have latency <500ms for the month… or customer gets money back. SLO: 95% of web requests have latency <500ms over a rolling month. SLI: web requests latency <500ms 33
  • 24. Instance Downtime Occurs Datadog picks up that instance is down for SLI calculation (metric), auto tracks SLO is now impacted (monitor) PagerDuty fires an alert that the uptime SLO has been breached Gremlin Downtime SLO Scenario is run on staging
  • 25. Revising and Revisiting Photo by Jonathan Kemper on Unsplash These are internal tools! You can change them if they no longer work for you!
  • 26. Use Chaos Engineering to test out new features and focus on your SLOs 40
  • 28. Working with Upstream Dependencies Do your dependencies publish their own SLOs? Can you defensively code around bad performance? Do you need to explore alternatives?
  • 29. Use Chaos Engineering to validate your dependencies ?
  • 30. Unplanned Work Incidents can indicate that work needs to be done Your SLOs and error budgets are part of your postmortem discussion Revisit and prioritize work based on the outcomes of a major incident
  • 31. Lifecycle • Research user behavior • Measure and monitor for reliability and performance • Set goals, write SLIs, establish SLOs • Work to keep SLOs in the green • Verify SLOs and error budgets in incident post mortems • Adjust SLOs to new business requirements
  • 32. Summary SLIs prioritize the User Experience SLOs quantify “good” vs “bad” experience to a quantitative goal Error Budgets tell your team where you stand They all feed back into the work prioritization process You can change them when they no longer work for you
  • 33. Resources Talks at SLOConf: https://www.sloconf.com/talks Google’s SRE Books are available online: https://sre.google/books/ Implementing SLOs: https://www.oreilly.com/library/view/implementing-service-level/9781492076803/ Gremlin Free: https://www.gremlin.com/buttons/ Sign up for a PagerDuty trial at https://pagerduty.com/sign-up Gremlin Certified Chaos Engineering Professional certification: https://www.gremlin.com/certification
  • 35. Moving to the cloud Verify host failure, autoscaling rules, and memory. Migrating to microservices Validate that each new service can fail independently. Protect against cascading failures and knock-on effects. Adopting Kubernetes The devil is the in details. Have you configured everything correctly? Are you running one large cluster? Find your monitoring gaps, reduce signal to noise “We’ll get paged if that breaks”, until you don’t. A false sense of security is worse than nothing. Train your teams We run fire drills, train firefights, and first responders. Are you investing in your operations teams? 55