SlideShare a Scribd company logo

Hidden Costs of Chasing the Mythical 'Five Nines'

DevOpsDays DFW
DevOpsDays DFWDevOpsDays DFW

“Five Nines” refers to the five nines in 99.999% available that is often synonymous with highly available. Does every highly available service require five nines? Not by a long shot. Yet the general state of the practice is to chase after this typically unrealistic goal almost blindly in many cases, often leading to unnecessarily high costs in both operational and development resources. Even less aggressive availability goals are often over-specified compared to true business drivers. This talk will cover: * The history of “five nines” Common reasons why many organizations often inadvertently over-specify availability requirements * The costs of such over-specification * How service agility is negatively affected * Examples of highly available systems with reasonable availability requirements * Techniques on how to avoid over-specification based on Site Reliability Engineering principles * Ways to spend your Error Budget (once you have one) most effectively Applying these techniques should result in a more cost-effective service that keeps end users and management happy, and fewer alerts to the on-call DevOps engineer.

Hidden Costs of Chasing the Mythical 'Five Nines'

DevOpsDays DFW
DevOpsDays DFWDevOpsDays DFW

“Five Nines” refers to the five nines in 99.999% available that is often synonymous with highly available. Does every highly available service require five nines? Not by a long shot. Yet the general state of the practice is to chase after this typically unrealistic goal almost blindly in many cases, often leading to unnecessarily high costs in both operational and development resources. Even less aggressive availability goals are often over-specified compared to true business drivers. This talk will cover: * The history of “five nines” Common reasons why many organizations often inadvertently over-specify availability requirements * The costs of such over-specification * How service agility is negatively affected * Examples of highly available systems with reasonable availability requirements * Techniques on how to avoid over-specification based on Site Reliability Engineering principles * Ways to spend your Error Budget (once you have one) most effectively Applying these techniques should result in a more cost-effective service that keeps end users and management happy, and fewer alerts to the on-call DevOps engineer.

Hidden Costs of Chasing the Mythical 'Five Nines'

1 of 44
Download to read offline
Hidden Costs of Chasing the
Mythical “Five Nines”
Steve Fox
Founder / CEO
AutoScalr
Hidden Costs of Chasing the Mythical 'Five Nines'
How Did We Become So Obsessed With 9’s?
New Perspective
Application Quality
• Quality
• Higher Availability is better
• Fewer errors better
• Fewer slow responses better
• Perfect Quality
• 100% Available
• Zero errors / outages
• Zero slow responses
Perfect Quality the Enemy of Progress?

Recommended

Computer Forensics
Computer ForensicsComputer Forensics
Computer ForensicsBense Tony
 
Investigação de Crimes Digitais - Carreira em Computação Forense
Investigação de Crimes Digitais - Carreira em Computação ForenseInvestigação de Crimes Digitais - Carreira em Computação Forense
Investigação de Crimes Digitais - Carreira em Computação ForenseVaine Luiz Barreira, MBA
 
Memory forensics.pptx
Memory forensics.pptxMemory forensics.pptx
Memory forensics.pptx9905234521
 
Memory forensics
Memory forensicsMemory forensics
Memory forensicsSunil Kumar
 
What is Ethical Hacking? | Ethical Hacking for Beginners | Ethical Hacking Co...
What is Ethical Hacking? | Ethical Hacking for Beginners | Ethical Hacking Co...What is Ethical Hacking? | Ethical Hacking for Beginners | Ethical Hacking Co...
What is Ethical Hacking? | Ethical Hacking for Beginners | Ethical Hacking Co...Edureka!
 
Network Forensics Intro
Network Forensics IntroNetwork Forensics Intro
Network Forensics IntroJake K.
 

More Related Content

What's hot

Malware analysis using volatility
Malware analysis using volatilityMalware analysis using volatility
Malware analysis using volatilityYashashree Gund
 
Internet Traffic Monitoring and Analysis
Internet Traffic Monitoring and AnalysisInternet Traffic Monitoring and Analysis
Internet Traffic Monitoring and AnalysisInformation Technology
 
Investigative Tools and Equipments for Cyber Crime by Raghu Khimani
Investigative Tools and Equipments for Cyber Crime by Raghu KhimaniInvestigative Tools and Equipments for Cyber Crime by Raghu Khimani
Investigative Tools and Equipments for Cyber Crime by Raghu KhimaniDr Raghu Khimani
 
SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?
SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?
SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?Ertugrul Akbas
 
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01Michael Gough
 
Cloud-forensics
Cloud-forensicsCloud-forensics
Cloud-forensicsanupriti
 
The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0Michael Gough
 
Windows 7 forensics event logs-dtl-r3
Windows 7 forensics event logs-dtl-r3Windows 7 forensics event logs-dtl-r3
Windows 7 forensics event logs-dtl-r3CTIN
 
INVESTIGATING UNIX SYSTEMS.pptx
INVESTIGATING UNIX SYSTEMS.pptxINVESTIGATING UNIX SYSTEMS.pptx
INVESTIGATING UNIX SYSTEMS.pptxAmAngel1
 
A brief Introduction on Video surveillance Technology
A brief Introduction on Video surveillance TechnologyA brief Introduction on Video surveillance Technology
A brief Introduction on Video surveillance TechnologyAneesh Suresh
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Splunk
 
Best Practices for Configuring Your OSSIM Installation
Best Practices for Configuring Your OSSIM InstallationBest Practices for Configuring Your OSSIM Installation
Best Practices for Configuring Your OSSIM InstallationAlienVault
 
Red Team Apocalypse
Red Team ApocalypseRed Team Apocalypse
Red Team ApocalypseBeau Bullock
 
Bypass file upload restrictions
Bypass file upload restrictionsBypass file upload restrictions
Bypass file upload restrictionsMukesh k.r
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware AnalysisAlbert Hui
 
5 biggest cyber attacks and most famous hackers
5 biggest cyber attacks and most famous hackers5 biggest cyber attacks and most famous hackers
5 biggest cyber attacks and most famous hackersRoman Antonov
 
Detection Rules Coverage
Detection Rules CoverageDetection Rules Coverage
Detection Rules CoverageSunny Neo
 

What's hot (20)

Malware analysis using volatility
Malware analysis using volatilityMalware analysis using volatility
Malware analysis using volatility
 
Internet Traffic Monitoring and Analysis
Internet Traffic Monitoring and AnalysisInternet Traffic Monitoring and Analysis
Internet Traffic Monitoring and Analysis
 
Pen test methodology
Pen test methodologyPen test methodology
Pen test methodology
 
Investigative Tools and Equipments for Cyber Crime by Raghu Khimani
Investigative Tools and Equipments for Cyber Crime by Raghu KhimaniInvestigative Tools and Equipments for Cyber Crime by Raghu Khimani
Investigative Tools and Equipments for Cyber Crime by Raghu Khimani
 
SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?
SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?
SIEM ÇÖZÜMLERİNDE TAXONOMY NE İŞE YARAR?
 
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
 
Cloud-forensics
Cloud-forensicsCloud-forensics
Cloud-forensics
 
The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0
 
Windows 7 forensics event logs-dtl-r3
Windows 7 forensics event logs-dtl-r3Windows 7 forensics event logs-dtl-r3
Windows 7 forensics event logs-dtl-r3
 
INVESTIGATING UNIX SYSTEMS.pptx
INVESTIGATING UNIX SYSTEMS.pptxINVESTIGATING UNIX SYSTEMS.pptx
INVESTIGATING UNIX SYSTEMS.pptx
 
A brief Introduction on Video surveillance Technology
A brief Introduction on Video surveillance TechnologyA brief Introduction on Video surveillance Technology
A brief Introduction on Video surveillance Technology
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
 
The Forensic Lab
The Forensic LabThe Forensic Lab
The Forensic Lab
 
Best Practices for Configuring Your OSSIM Installation
Best Practices for Configuring Your OSSIM InstallationBest Practices for Configuring Your OSSIM Installation
Best Practices for Configuring Your OSSIM Installation
 
Red Team Apocalypse
Red Team ApocalypseRed Team Apocalypse
Red Team Apocalypse
 
Bypass file upload restrictions
Bypass file upload restrictionsBypass file upload restrictions
Bypass file upload restrictions
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware Analysis
 
pfSense 2.0 Eğitim Sunumu
pfSense 2.0 Eğitim SunumupfSense 2.0 Eğitim Sunumu
pfSense 2.0 Eğitim Sunumu
 
5 biggest cyber attacks and most famous hackers
5 biggest cyber attacks and most famous hackers5 biggest cyber attacks and most famous hackers
5 biggest cyber attacks and most famous hackers
 
Detection Rules Coverage
Detection Rules CoverageDetection Rules Coverage
Detection Rules Coverage
 

Similar to Hidden Costs of Chasing the Mythical 'Five Nines'

Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
 
Bring Down Costs by Controlling Cloud Capacity
Bring Down Costs by Controlling Cloud Capacity Bring Down Costs by Controlling Cloud Capacity
Bring Down Costs by Controlling Cloud Capacity Precisely
 
HSI's Cloud-Hosted Foglight IT Monitoring & APM
HSI's Cloud-Hosted Foglight IT Monitoring & APMHSI's Cloud-Hosted Foglight IT Monitoring & APM
HSI's Cloud-Hosted Foglight IT Monitoring & APMKent Cartwright
 
Cloud Optimization: Filling in the Gaps
Cloud Optimization: Filling in the GapsCloud Optimization: Filling in the Gaps
Cloud Optimization: Filling in the Gaps2nd Watch
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Amazon Web Services
 
Comparing Cloud platforms and tools
Comparing Cloud platforms and toolsComparing Cloud platforms and tools
Comparing Cloud platforms and toolssameerabrol
 
Comparing Cloud Providers, Platforms and Tools
Comparing Cloud Providers, Platforms and ToolsComparing Cloud Providers, Platforms and Tools
Comparing Cloud Providers, Platforms and ToolsInnoTech
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksAppDynamics
 
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioCost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioDocker, Inc.
 
The Missing Step in Release Management
The Missing Step in Release ManagementThe Missing Step in Release Management
The Missing Step in Release ManagementXebiaLabs
 
Softchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost GovernanceSoftchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost GovernanceSoftchoice Corporation
 
Great Lakes Oracle Conference (GLOC) Benefits of migrating to the Cloud- Me...
Great Lakes Oracle Conference (GLOC)  Benefits of migrating to the Cloud-  Me...Great Lakes Oracle Conference (GLOC)  Benefits of migrating to the Cloud-  Me...
Great Lakes Oracle Conference (GLOC) Benefits of migrating to the Cloud- Me...ebreger
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIRightScale
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APMJonah Kowall
 
Pmg tag bpm_presentation
Pmg tag bpm_presentationPmg tag bpm_presentation
Pmg tag bpm_presentationMelanie Brandt
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAmazon Web Services
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
Understanding VMware Capacity
Understanding VMware CapacityUnderstanding VMware Capacity
Understanding VMware CapacityPrecisely
 
AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...
AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...
AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...Amazon Web Services
 
Robin Daniels Presentation / CloudViews.Org - Cloud Computing Conference 2009
Robin Daniels Presentation / CloudViews.Org - Cloud Computing  Conference 2009Robin Daniels Presentation / CloudViews.Org - Cloud Computing  Conference 2009
Robin Daniels Presentation / CloudViews.Org - Cloud Computing Conference 2009EuroCloud
 

Similar to Hidden Costs of Chasing the Mythical 'Five Nines' (20)

Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
 
Bring Down Costs by Controlling Cloud Capacity
Bring Down Costs by Controlling Cloud Capacity Bring Down Costs by Controlling Cloud Capacity
Bring Down Costs by Controlling Cloud Capacity
 
HSI's Cloud-Hosted Foglight IT Monitoring & APM
HSI's Cloud-Hosted Foglight IT Monitoring & APMHSI's Cloud-Hosted Foglight IT Monitoring & APM
HSI's Cloud-Hosted Foglight IT Monitoring & APM
 
Cloud Optimization: Filling in the Gaps
Cloud Optimization: Filling in the GapsCloud Optimization: Filling in the Gaps
Cloud Optimization: Filling in the Gaps
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
 
Comparing Cloud platforms and tools
Comparing Cloud platforms and toolsComparing Cloud platforms and tools
Comparing Cloud platforms and tools
 
Comparing Cloud Providers, Platforms and Tools
Comparing Cloud Providers, Platforms and ToolsComparing Cloud Providers, Platforms and Tools
Comparing Cloud Providers, Platforms and Tools
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioCost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
 
The Missing Step in Release Management
The Missing Step in Release ManagementThe Missing Step in Release Management
The Missing Step in Release Management
 
Softchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost GovernanceSoftchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost Governance
 
Great Lakes Oracle Conference (GLOC) Benefits of migrating to the Cloud- Me...
Great Lakes Oracle Conference (GLOC)  Benefits of migrating to the Cloud-  Me...Great Lakes Oracle Conference (GLOC)  Benefits of migrating to the Cloud-  Me...
Great Lakes Oracle Conference (GLOC) Benefits of migrating to the Cloud- Me...
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROI
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
 
Pmg tag bpm_presentation
Pmg tag bpm_presentationPmg tag bpm_presentation
Pmg tag bpm_presentation
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Understanding VMware Capacity
Understanding VMware CapacityUnderstanding VMware Capacity
Understanding VMware Capacity
 
AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...
AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...
AWS re:Invent 2016: Start Your Cost Optimization Program: Learning from Intui...
 
Robin Daniels Presentation / CloudViews.Org - Cloud Computing Conference 2009
Robin Daniels Presentation / CloudViews.Org - Cloud Computing  Conference 2009Robin Daniels Presentation / CloudViews.Org - Cloud Computing  Conference 2009
Robin Daniels Presentation / CloudViews.Org - Cloud Computing Conference 2009
 

More from DevOpsDays DFW

Michael Coté - The Eternal Recurrence of DevOps
Michael Coté - The Eternal Recurrence of DevOpsMichael Coté - The Eternal Recurrence of DevOps
Michael Coté - The Eternal Recurrence of DevOpsDevOpsDays DFW
 
Nigel Thurlow - DevOps is Enterprise Wide.pdf
Nigel Thurlow - DevOps is Enterprise Wide.pdfNigel Thurlow - DevOps is Enterprise Wide.pdf
Nigel Thurlow - DevOps is Enterprise Wide.pdfDevOpsDays DFW
 
Michael Nygard - Uncoupling
Michael Nygard - UncouplingMichael Nygard - Uncoupling
Michael Nygard - UncouplingDevOpsDays DFW
 
Dan Barker - Understanding Risk Can Fund Transformation
Dan Barker - Understanding Risk Can Fund TransformationDan Barker - Understanding Risk Can Fund Transformation
Dan Barker - Understanding Risk Can Fund TransformationDevOpsDays DFW
 
Vijay Challa - SSO on Cloud - Gateway Approach
Vijay Challa - SSO on Cloud - Gateway ApproachVijay Challa - SSO on Cloud - Gateway Approach
Vijay Challa - SSO on Cloud - Gateway ApproachDevOpsDays DFW
 
Aaron Mell - The Continuous Improvement Toolbox: Post-Mortems
Aaron Mell - The Continuous Improvement Toolbox: Post-MortemsAaron Mell - The Continuous Improvement Toolbox: Post-Mortems
Aaron Mell - The Continuous Improvement Toolbox: Post-MortemsDevOpsDays DFW
 
Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...
Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...
Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...DevOpsDays DFW
 
Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...
Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...
Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...DevOpsDays DFW
 
Bjorn Edwin - Start Your Own DevOps Dojo in 8 Simple Steps
Bjorn Edwin - Start Your Own DevOps Dojo in 8 Simple StepsBjorn Edwin - Start Your Own DevOps Dojo in 8 Simple Steps
Bjorn Edwin - Start Your Own DevOps Dojo in 8 Simple StepsDevOpsDays DFW
 
Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'
Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'
Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'DevOpsDays DFW
 
Dana Finster - DevOps - Do the Math
Dana Finster - DevOps - Do the MathDana Finster - DevOps - Do the Math
Dana Finster - DevOps - Do the MathDevOpsDays DFW
 
Detangling complex systems with compassion & production excellence
Detangling complex systems with compassion & production excellenceDetangling complex systems with compassion & production excellence
Detangling complex systems with compassion & production excellenceDevOpsDays DFW
 
Speeding Up Innovation
Speeding Up InnovationSpeeding Up Innovation
Speeding Up InnovationDevOpsDays DFW
 
DevOps Theory vs. Practice: A Song of Ice and Tire-Fire
DevOps Theory vs. Practice: A Song of Ice and Tire-FireDevOps Theory vs. Practice: A Song of Ice and Tire-Fire
DevOps Theory vs. Practice: A Song of Ice and Tire-FireDevOpsDays DFW
 
Stepping Up Your DevOps With Step Functions
Stepping Up Your DevOps With Step FunctionsStepping Up Your DevOps With Step Functions
Stepping Up Your DevOps With Step FunctionsDevOpsDays DFW
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevOpsDays DFW
 
Avoid the Distributed Monolith!!
Avoid the Distributed Monolith!!Avoid the Distributed Monolith!!
Avoid the Distributed Monolith!!DevOpsDays DFW
 
Using Docker to Build Software
Using Docker to Build SoftwareUsing Docker to Build Software
Using Docker to Build SoftwareDevOpsDays DFW
 
Managing Cloud Infrastructure at Scale
Managing Cloud Infrastructure at ScaleManaging Cloud Infrastructure at Scale
Managing Cloud Infrastructure at ScaleDevOpsDays DFW
 
The 12 Layer Burrito VS The 12 Factor APP
The 12 Layer Burrito VS The 12 Factor APPThe 12 Layer Burrito VS The 12 Factor APP
The 12 Layer Burrito VS The 12 Factor APPDevOpsDays DFW
 

More from DevOpsDays DFW (20)

Michael Coté - The Eternal Recurrence of DevOps
Michael Coté - The Eternal Recurrence of DevOpsMichael Coté - The Eternal Recurrence of DevOps
Michael Coté - The Eternal Recurrence of DevOps
 
Nigel Thurlow - DevOps is Enterprise Wide.pdf
Nigel Thurlow - DevOps is Enterprise Wide.pdfNigel Thurlow - DevOps is Enterprise Wide.pdf
Nigel Thurlow - DevOps is Enterprise Wide.pdf
 
Michael Nygard - Uncoupling
Michael Nygard - UncouplingMichael Nygard - Uncoupling
Michael Nygard - Uncoupling
 
Dan Barker - Understanding Risk Can Fund Transformation
Dan Barker - Understanding Risk Can Fund TransformationDan Barker - Understanding Risk Can Fund Transformation
Dan Barker - Understanding Risk Can Fund Transformation
 
Vijay Challa - SSO on Cloud - Gateway Approach
Vijay Challa - SSO on Cloud - Gateway ApproachVijay Challa - SSO on Cloud - Gateway Approach
Vijay Challa - SSO on Cloud - Gateway Approach
 
Aaron Mell - The Continuous Improvement Toolbox: Post-Mortems
Aaron Mell - The Continuous Improvement Toolbox: Post-MortemsAaron Mell - The Continuous Improvement Toolbox: Post-Mortems
Aaron Mell - The Continuous Improvement Toolbox: Post-Mortems
 
Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...
Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...
Steve Shangguan - The Unreasonable Effectiveness of Combining and Correlating...
 
Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...
Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...
Farrah Campbell - Open Mind, Open Doors. Change your narrative and achieve wh...
 
Bjorn Edwin - Start Your Own DevOps Dojo in 8 Simple Steps
Bjorn Edwin - Start Your Own DevOps Dojo in 8 Simple StepsBjorn Edwin - Start Your Own DevOps Dojo in 8 Simple Steps
Bjorn Edwin - Start Your Own DevOps Dojo in 8 Simple Steps
 
Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'
Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'
Crux Conception - 'TECH-LIVES MATTER, HANDS UP, DON'T REBOOT'
 
Dana Finster - DevOps - Do the Math
Dana Finster - DevOps - Do the MathDana Finster - DevOps - Do the Math
Dana Finster - DevOps - Do the Math
 
Detangling complex systems with compassion & production excellence
Detangling complex systems with compassion & production excellenceDetangling complex systems with compassion & production excellence
Detangling complex systems with compassion & production excellence
 
Speeding Up Innovation
Speeding Up InnovationSpeeding Up Innovation
Speeding Up Innovation
 
DevOps Theory vs. Practice: A Song of Ice and Tire-Fire
DevOps Theory vs. Practice: A Song of Ice and Tire-FireDevOps Theory vs. Practice: A Song of Ice and Tire-Fire
DevOps Theory vs. Practice: A Song of Ice and Tire-Fire
 
Stepping Up Your DevOps With Step Functions
Stepping Up Your DevOps With Step FunctionsStepping Up Your DevOps With Step Functions
Stepping Up Your DevOps With Step Functions
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the Trauma
 
Avoid the Distributed Monolith!!
Avoid the Distributed Monolith!!Avoid the Distributed Monolith!!
Avoid the Distributed Monolith!!
 
Using Docker to Build Software
Using Docker to Build SoftwareUsing Docker to Build Software
Using Docker to Build Software
 
Managing Cloud Infrastructure at Scale
Managing Cloud Infrastructure at ScaleManaging Cloud Infrastructure at Scale
Managing Cloud Infrastructure at Scale
 
The 12 Layer Burrito VS The 12 Factor APP
The 12 Layer Burrito VS The 12 Factor APPThe 12 Layer Burrito VS The 12 Factor APP
The 12 Layer Burrito VS The 12 Factor APP
 

Recently uploaded

Accelerating Forklift Sales: Mastering CPQ with CRM & LiftNet Integration
Accelerating Forklift Sales: Mastering CPQ with CRM & LiftNet IntegrationAccelerating Forklift Sales: Mastering CPQ with CRM & LiftNet Integration
Accelerating Forklift Sales: Mastering CPQ with CRM & LiftNet IntegrationBrainSell Technologies
 
iasw-cad-drawings-FINAL.pptx
iasw-cad-drawings-FINAL.pptxiasw-cad-drawings-FINAL.pptx
iasw-cad-drawings-FINAL.pptxehclark63
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
C++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh ShareC++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh ShareNho Vĩnh
 
WEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptxWEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptxJOEYJIMENEZ7
 
scale-model-slides.pptx
scale-model-slides.pptxscale-model-slides.pptx
scale-model-slides.pptxehclark63
 
Azure DevOps with Power Automate - Guideline
Azure DevOps with Power Automate - GuidelineAzure DevOps with Power Automate - Guideline
Azure DevOps with Power Automate - GuidelineTeerasej Jiraphatchandej
 
Our IEEE LTSC voting members recap 2023 and what’s next for the standards
Our IEEE LTSC voting members recap 2023 and what’s next for the standardsOur IEEE LTSC voting members recap 2023 and what’s next for the standards
Our IEEE LTSC voting members recap 2023 and what’s next for the standardsRustici Software
 
Turning Your Volunteers Into Donors: Insights for Multi-Chapter Nonprofits
Turning Your Volunteers Into Donors: Insights for Multi-Chapter NonprofitsTurning Your Volunteers Into Donors: Insights for Multi-Chapter Nonprofits
Turning Your Volunteers Into Donors: Insights for Multi-Chapter NonprofitsBloomerang
 
The Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and TakeawaysThe Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and TakeawaysThousandEyes
 
Wait Storyboard.pptx
Wait Storyboard.pptxWait Storyboard.pptx
Wait Storyboard.pptxehclark63
 
3D PRINTER technology by sultana.pptx
3D PRINTER technology by sultana.pptx3D PRINTER technology by sultana.pptx
3D PRINTER technology by sultana.pptxriyasathalikhan03
 
A BluePrint for the Future of Smart Building Retrofits
A BluePrint for the Future of Smart Building RetrofitsA BluePrint for the Future of Smart Building Retrofits
A BluePrint for the Future of Smart Building RetrofitsMemoori
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostAggregage
 
scale-model-slides.pptx
scale-model-slides.pptxscale-model-slides.pptx
scale-model-slides.pptxehclark63
 
OpenText Cybersecurity Tabletop Exercise
OpenText Cybersecurity Tabletop ExerciseOpenText Cybersecurity Tabletop Exercise
OpenText Cybersecurity Tabletop ExerciseMarc St-Pierre
 
Voxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdf
Voxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdfVoxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdf
Voxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdfIván López Martín
 

Recently uploaded (20)

Accelerating Forklift Sales: Mastering CPQ with CRM & LiftNet Integration
Accelerating Forklift Sales: Mastering CPQ with CRM & LiftNet IntegrationAccelerating Forklift Sales: Mastering CPQ with CRM & LiftNet Integration
Accelerating Forklift Sales: Mastering CPQ with CRM & LiftNet Integration
 
GDSC ML-1.pptx
GDSC ML-1.pptxGDSC ML-1.pptx
GDSC ML-1.pptx
 
iasw-cad-drawings-FINAL.pptx
iasw-cad-drawings-FINAL.pptxiasw-cad-drawings-FINAL.pptx
iasw-cad-drawings-FINAL.pptx
 
ADI Program Information Webinar
ADI Program Information WebinarADI Program Information Webinar
ADI Program Information Webinar
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
C++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh ShareC++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh Share
 
WEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptxWEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptx
 
scale-model-slides.pptx
scale-model-slides.pptxscale-model-slides.pptx
scale-model-slides.pptx
 
Azure DevOps with Power Automate - Guideline
Azure DevOps with Power Automate - GuidelineAzure DevOps with Power Automate - Guideline
Azure DevOps with Power Automate - Guideline
 
Our IEEE LTSC voting members recap 2023 and what’s next for the standards
Our IEEE LTSC voting members recap 2023 and what’s next for the standardsOur IEEE LTSC voting members recap 2023 and what’s next for the standards
Our IEEE LTSC voting members recap 2023 and what’s next for the standards
 
Turning Your Volunteers Into Donors: Insights for Multi-Chapter Nonprofits
Turning Your Volunteers Into Donors: Insights for Multi-Chapter NonprofitsTurning Your Volunteers Into Donors: Insights for Multi-Chapter Nonprofits
Turning Your Volunteers Into Donors: Insights for Multi-Chapter Nonprofits
 
The Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and TakeawaysThe Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and Takeaways
 
Wait Storyboard.pptx
Wait Storyboard.pptxWait Storyboard.pptx
Wait Storyboard.pptx
 
3D PRINTER technology by sultana.pptx
3D PRINTER technology by sultana.pptx3D PRINTER technology by sultana.pptx
3D PRINTER technology by sultana.pptx
 
A BluePrint for the Future of Smart Building Retrofits
A BluePrint for the Future of Smart Building RetrofitsA BluePrint for the Future of Smart Building Retrofits
A BluePrint for the Future of Smart Building Retrofits
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
 
scale-model-slides.pptx
scale-model-slides.pptxscale-model-slides.pptx
scale-model-slides.pptx
 
OpenText Cybersecurity Tabletop Exercise
OpenText Cybersecurity Tabletop ExerciseOpenText Cybersecurity Tabletop Exercise
OpenText Cybersecurity Tabletop Exercise
 
Cost and performance aware scheduling technique for cloud computing environment
Cost and performance aware scheduling technique for cloud  computing environmentCost and performance aware scheduling technique for cloud  computing environment
Cost and performance aware scheduling technique for cloud computing environment
 
Voxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdf
Voxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdfVoxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdf
Voxxed Days CERN 2024 - Spring Boot <3 Testcontainers.pdf
 

Hidden Costs of Chasing the Mythical 'Five Nines'

  • 1. Hidden Costs of Chasing the Mythical “Five Nines” Steve Fox Founder / CEO AutoScalr
  • 3. How Did We Become So Obsessed With 9’s?
  • 5. Application Quality • Quality • Higher Availability is better • Fewer errors better • Fewer slow responses better • Perfect Quality • 100% Available • Zero errors / outages • Zero slow responses
  • 6. Perfect Quality the Enemy of Progress?
  • 7. Progress of an Application? (Success) Profitability? = Revenue – Cost Market Share? • Responsive to Market Needs • Solve a Meaningful Problem • Drives Revenue • Operational Cost: key component of profitability • Velocity: key component to market share • how fast you can adapt to market needs, deliver new features, and capture market share Would Chasing Five Nines affect either of these?
  • 8. Definition of Five Nines • 99.999% available • 26 seconds downtime per month • 5.26 minutes downtime per year
  • 9. Origin of Five Nines • Telecom Industry 1990s • Used interchangeably with “carrier-grade” • De Facto Standard for telecom hardware components • Driven by the need to manage cascading dependencies
  • 10. System vs Component Availability 50% 50% 25% 99.9% 99.9% 99.8% 99.9% 99.9%99.9% 99.9% 99.9% 99.9% 99.9% 99.9% 99.9% 99.9% 99% Series Components = C1 x C2 …
  • 11. System vs Component Availability Parallel Components = 1 – (1-Cx )2 75% 50% 50% 99.99% 99% 99%
  • 12. A is Born • Five Nines jumped from hardware component to entire network system level • Driven by competitive sales / marketing pressure • Failures to meet SLA often carry financial penalties • Small relative to contract value • Some vendors “over-committed” on the SLA
  • 13. SLA Fine Print • Network Provider Advertised 99.999% availability • Refunds only paid if total outages over an HOUR monthly • In effect, 99.85% http://www.longpelaexpertise.com.au/ezine/FiveNinesAvailability.php
  • 14. Want versus Need • We all “want” our systems to be working 100% of the time • Do we really “need” them to? • Can we really “expect” them to? • And how much are we willing to pay to avoid a few minutes of downtime? • 2X, 3X, 10X? • 99.999% availability is typically rooted in a “want” not “need”
  • 15. Is Five Nines Realistically Possible? • ITIC Survey indicated 44% of outages were caused by human error • Can those be fixed in under 5 minutes? • S3 Outage in Virginia Region (Feb 2017) • One wrong parameter on admin command brought S3 in that region down for 4 hours • Prior to that was exceeding published SLA http://www.longpelaexpertise.com.au/ezine/FiveNinesAvailability.php https://aws.amazon.com/message/41926/
  • 16. Do Mobile Service Providers Deliver Five Nines? Not even close Average: 98.8%
  • 17. Do ISP’s Deliver Five Nines? Closer to Three Nines: 99.9% – Google estimates between 0.01 and 1% page failure rate (2 to 4 nines)
  • 18. Will Consumers Notice Higher Availability Than Their ISP? Average: 99.9% Tip: Consider User Perception in Error Handling Strategy Better Maybe No Response? Might blame on ISP
  • 19. Cloud Provider Published “High Available” SLA’s Application / Service SLA AWS Single Availability Zone 99.95 AWS Region (Multi-AZ) 99.99 AWS S3 99.9 Google Compute Engine (GCE) 99.95 Google Apps for Work 99.9 Azure 99.9
  • 20. True 99.999% Systems Do Exist • Banking, Healthcare, Airlines, etc. • Consumer of the service not coming over public Internet • For majority of cloud apps it is an over-specification • Can be done, but requires multi-region and mature DevOps, and is expensive • Question that the added cost and complexity is justified • Is Four-Nines even justifiable?
  • 21. Hypothetical XYZ.com • $100M online annually • CTO/CFO: “We need five nines because every minute the site is down we lose revenue” • $100M annually => $190 / minute • Assume 25% of revenue in total DevOps cost ($25 M) • 4 to 5 nines (52 minutes to 5) • Spend $10M to save ~$10K? • Stronger case for 3 nines (99.9%)
  • 22. Be Wary of “Big Number Rationale” • Amazon Prime Day “Glitch” • $72 M lost revenue • $4.19 B total revenue • 1.7% lost revenue • Revenue +74% YOY • 98.3% • Customer Sat / Brand Perception Cost https://www.digitalcommerce360.com/article/amazon-prime-day-data
  • 24. Newton’s Laws of Motion Applied to Software 1st Law: Objects at rest tend to stay at rest until an outside force is applied • Applications tend to run reliably when no changes are applied 2nd Law: Momentum change is proportional to the amount of force applied • The more changes made to an application the higher the likelihood errors will be introduced
  • 26. 99.999% Velocity Velocity Loss Largest Hidden Cost of Chasing Five Nines Site Reliability Engineering (SRE) concept of an Error Budget explicitly manages this relationship
  • 27. Site Reliability Engineering (SRE) • Term coined by Google • Codified Definitions for how Google does DevOps • Free online book: https://landing.google.com/sre/book.html • “Seeking SRE”: https://www.amazon.com/Seeking-SRE- Conversations-Running-Production/dp/1491978864 • One core tenet: “Pursuing Maximum Change Velocity Without Violating a Service’s SLO”
  • 28. SRE Terminology • Service Level Indicator (SLI) Metric for measuring service quality Example: Error Rate • Service Level Objective (SLO) Example: Max Error Rate of 0.1% (99.9% success) • Service Level Agreement (SLA) Set of SLO’s that make up a contractual obligation, often with financial penalties Example: Refund if any of 10 SLO’s not met in a month
  • 29. SRE Availability Definition • Shift from “uptime” to “successful request percentage” • Finer granularity of the “partial” outages where only a percentage of requests are failing
  • 30. Error Budget • Engineering Approach to Managing Service Risk • Translates SLO to acceptable error count • SLO for Error Rate 0.1% (99.9%) • Average 1M requests a month • 1M x 0.1% = 1000 requests for Monthly Error Budget (99.999% would give you only 10 per month) • Organizational Asset Similar to Financial Budget • Judiciously “Spend” this Asset to Maximize Velocity while maintaining SLO
  • 31. Velocity When Under Budget, Increase Velocity, Take More Risk • More Features • Less QA • Larger Canaries • Move Faster
  • 32. Velocity As Budget is Consumed, Reduce Velocity • Less Features • More QA • Smaller Canaries • Move Slower
  • 33. Velocity If Budget Totally Consumed, Stop Deployments • More analysis on where budget was spent and why • Invest in tooling / processes to be more efficient
  • 34. Track and Analyze Error Budget Usage 0 200 400 600 800 1000 1200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Error Budget Deployment Freeze Deployment Problem with No Canary
  • 35. Error Budgets • Support Intelligent Quantitative Trade-Offs between Velocity and Availability • Some Errors are Good! Indicator of Velocity. • Practical Considerations: • What if Budget Spent on 3rd Party Outage?
  • 36. Unspent Error Budget? • Consistently Over-Delivering on SLO • Consider Introducing Errors • Keeps dependent services from “assuming” a higher SLO based upon history, examples: • S3 • Google’s Chubby
  • 37. Performance Budget • Traditionally Hard to Detect “Over-Provisioned” State • Performance Budget can detect • Same concept as Error Budget but for SLO on Performance • Trade-off is with Operational Costs instead of Velocity • “Spend” your Performance Budget on Lowering Operational Costs • Engineering approach to quantitatively manage provisioning levels to avoid both over and under-provisioning.
  • 38. Cost Effective When Under Performance Budget, More Cost Effective • Less Spare Capacity • “Closer to Edge” • Higher CPU levels • More Spot Instances
  • 39. Cost Effective As Budget is Consumed, Increase Spare Capacity • Medium Spare Capacity • Balanced Spot Instances • Reduce budget consumption
  • 40. Higher Cost Keep Adding Spare Capacity to Maintain SLO • High Spare Capacity • Higher % On-Demand Instances • Over-Provision
  • 43. Performance Budget • Most Production Systems Over-Provisioned relative to SLO • Under-Provisioned “Felt”, Over-Provisioned Largely Unnoticed • Rationalized as “Safety Net” • “Right-Sized” Systems Often 20-30% Less Operations Cost • Performance Budget can help identify and reclaim those costs
  • 44. Summary • Application Quality • Higher Availability is better • Fewer errors better • Fewer slow responses better • Important, but need to trade-off with Hidden Costs of: • Velocity • Operational Cost Steve Fox Email: steve@autoscalr.com LinkedIn: https://www.linkedin.com/in/stevecfox/ Twitter: @MrSteveFox, @autoscalr