SlideShare a Scribd company logo
1 of 23
Download to read offline
Embracing Chaos
Introducing Chaos Engineering to your organization
“Chaos Engineering is the discipline of experimenting on a
distributed system in order to build confidence in the system's
capability to withstand turbulent conditions in production.”
-- http://principlesofchaos.org/
Introduction
• Paul Osman - Senior Engineering Manager
• posman@underarmour.com
• Previous Lives: PagerDuty, 500px, SoundCloud
Game Days: Planned fault injection exercises.
Game Days
• Imagine what could fail.
• Figure out how to prevent it from affecting business,
implement that.
• Cause the failure scenario to happen in production,
hopefully to prove the non effect of the event, thus gaining
confidence in the system.
Trust
• Engineers <> Engineers
• Engineers <> Managers
• Non-Engineers <> Engineers
Engineers <> Engineers
This is just a healthy team. A few things I've found build trust on
a team:
• Embrace failures. Learn from them.
• Incident Response Process (STAT)
• Practice blame free retrospectives.
• Embrace ownership - engineers own alerts.
Engineers <> Managers
What can managers do to build trust?
• Nurture a blame free and just culture.
• Protect time for action items.
Engineers <> Non-Engineers
How about building trust between Engineers and Non-
Engineering stakeholders? (i.e. product, executives, customer
support, etc)
• Metrics that show business impact
• Be Transparent about Incidents
• Talk loudly about Chaos Engineering
Operational Maturity Checklist
• Incident Response Process
• Blame Free Retrospectives
• Action Items
• Metrics on Incidents
• Talk Loudly about Resiliency
Our First Game
Day
Failure Scenarios
• Scenario A - Weather HTTP Service Unavailable
• Scenario B - Weather MySQL RDS Unavailable
• Scenario C - The Weather Channel API - High Latency
• Scenario D - Workout Service Unavailable
• Scenario E - Weather Async Service Unavailable
Failure Scenarios
• Scenario A - Weather HTTP Service Unavailable
• Scenario B - Weather MySQL RDS Unavailable
• Scenario C - The Weather Channel API - High Latency
• Scenario D - Workout Service Unavailable
• Scenario E - Weather Async Service Unavailable
Scenario A - Weather HTTP
Service Unavailable
• Workout still shown, just without weather
• PagerDuty alert? Should fire a low urgency alert
Scenario B - Weather MySQL RDS
Unavailable
• Expected 503s when database down, service was throwing
504
• Had to restart service after database was brought back up -
connections were not being recycled
Scenario C - High Latency from
Weather Channel capability
• Requests timeout - should fire low urgency alert
• Action item: audit timeouts
• Expectation: asynchronous tasks are still processed
Take Aways!
• We learned a ton!
• Scheduled some valuable action items
• Just thinking about this stuff was worthwhile
• Less alert fatigue!
• Let's do more!
Next steps
• More teams doing more game days more frequently
• Build failure injection into our release process (production
readiness)
• Automate automate automate (hi Gremlin!)
Resources
• PagerDuty Incident Response Docs https://
response.pagerduty.com/
• Principles of Chaos https://principlesofchaos.org/
• Fault Injection in Production - https://queue.acm.org/
detail.cfm?id=2353017
• Gremlin Blog - https://www.gremlin.com/blog/
Thank you!
Psst https://careers.underarmour.com/
Or just talk to me:
posman@underarmour.com
@paulosman

More Related Content

What's hot

The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningAtlassian
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Ana Medina
 
Automated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian BaleaAutomated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian Balea3Pillar Global
 
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using MetricsScrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using MetricsAtlassian
 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java3Pillar Global
 
Chaos Engineering in a Multi-Cloud World | Escape Conference 2019
Chaos Engineering in a Multi-Cloud World | Escape Conference 2019 Chaos Engineering in a Multi-Cloud World | Escape Conference 2019
Chaos Engineering in a Multi-Cloud World | Escape Conference 2019 Ana Medina
 
The Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QC
The Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QCThe Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QC
The Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QCAna Medina
 
Chaos Engineering with Containers - QCon SF 2018
Chaos Engineering with Containers - QCon SF 2018 Chaos Engineering with Containers - QCon SF 2018
Chaos Engineering with Containers - QCon SF 2018 Ana Medina
 
Saltconf16 william-cannon b
Saltconf16 william-cannon bSaltconf16 william-cannon b
Saltconf16 william-cannon bWilliam Cannon
 
Next Level Chaos Engineering - Chaos Conf 2018
Next Level Chaos Engineering - Chaos Conf 2018 Next Level Chaos Engineering - Chaos Conf 2018
Next Level Chaos Engineering - Chaos Conf 2018 Ana Medina
 
You wouldn't build a toast, would you
You wouldn't build a toast, would youYou wouldn't build a toast, would you
You wouldn't build a toast, would youYan Cui
 
Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...
Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...
Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...Atlassian
 
Scaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps StoryScaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps StoryAtlassian
 
Test Your Own Stuff - Scrum Atlanta 2015
Test Your Own Stuff - Scrum Atlanta 2015Test Your Own Stuff - Scrum Atlanta 2015
Test Your Own Stuff - Scrum Atlanta 2015Alex Kell
 
The Atlassian Bug Bounty Program
The Atlassian Bug Bounty ProgramThe Atlassian Bug Bounty Program
The Atlassian Bug Bounty ProgramAtlassian
 
Identify Waste in your Build Pipeline
Identify Waste in your Build PipelineIdentify Waste in your Build Pipeline
Identify Waste in your Build PipelineScott Turnquest
 
Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014: Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014: Puppet
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 
Continuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at NetflixContinuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at NetflixAtlassian
 
How to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup MunichHow to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup MunichJürgen Etzlstorfer
 

What's hot (20)

The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance Tuning
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
 
Automated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian BaleaAutomated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian Balea
 
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using MetricsScrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java
 
Chaos Engineering in a Multi-Cloud World | Escape Conference 2019
Chaos Engineering in a Multi-Cloud World | Escape Conference 2019 Chaos Engineering in a Multi-Cloud World | Escape Conference 2019
Chaos Engineering in a Multi-Cloud World | Escape Conference 2019
 
The Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QC
The Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QCThe Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QC
The Practice of Chaos Engineering - Reactive Summit 2018 - Montreal, QC
 
Chaos Engineering with Containers - QCon SF 2018
Chaos Engineering with Containers - QCon SF 2018 Chaos Engineering with Containers - QCon SF 2018
Chaos Engineering with Containers - QCon SF 2018
 
Saltconf16 william-cannon b
Saltconf16 william-cannon bSaltconf16 william-cannon b
Saltconf16 william-cannon b
 
Next Level Chaos Engineering - Chaos Conf 2018
Next Level Chaos Engineering - Chaos Conf 2018 Next Level Chaos Engineering - Chaos Conf 2018
Next Level Chaos Engineering - Chaos Conf 2018
 
You wouldn't build a toast, would you
You wouldn't build a toast, would youYou wouldn't build a toast, would you
You wouldn't build a toast, would you
 
Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...
Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...
Not All Heroes Wear Capes: Skills and Tools Helpful in Becoming a Support Sup...
 
Scaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps StoryScaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps Story
 
Test Your Own Stuff - Scrum Atlanta 2015
Test Your Own Stuff - Scrum Atlanta 2015Test Your Own Stuff - Scrum Atlanta 2015
Test Your Own Stuff - Scrum Atlanta 2015
 
The Atlassian Bug Bounty Program
The Atlassian Bug Bounty ProgramThe Atlassian Bug Bounty Program
The Atlassian Bug Bounty Program
 
Identify Waste in your Build Pipeline
Identify Waste in your Build PipelineIdentify Waste in your Build Pipeline
Identify Waste in your Build Pipeline
 
Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014: Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014:
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
Continuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at NetflixContinuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at Netflix
 
How to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup MunichHow to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup Munich
 

Similar to Embrace Chaos - Introducing Chaos Engineering to your Organization

Accelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataAccelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataKyle Hailey
 
AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015
AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015
AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015Craig Salvalaggio
 
Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...John Hudson
 
MFG4 2016 - Is Automation Right for Your Company - 4-2016
MFG4 2016 -  Is Automation Right for Your Company - 4-2016MFG4 2016 -  Is Automation Right for Your Company - 4-2016
MFG4 2016 - Is Automation Right for Your Company - 4-2016Craig Salvalaggio
 
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...Matt Ray
 
Best Practices for Web Infrastructure on Amazon Web Services
Best Practices for Web Infrastructure on Amazon Web ServicesBest Practices for Web Infrastructure on Amazon Web Services
Best Practices for Web Infrastructure on Amazon Web ServicesBrett Gillett
 
Performance Tuning in the Trenches
Performance Tuning in the TrenchesPerformance Tuning in the Trenches
Performance Tuning in the TrenchesDonald Belcham
 
2 speed it powered by microsoft azure
2 speed it powered by microsoft azure2 speed it powered by microsoft azure
2 speed it powered by microsoft azureMichael Stephenson
 
DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14Kyle Hailey
 
Mobile User Experience: Auto Drive through Performance Metrics
Mobile User Experience:Auto Drive through Performance MetricsMobile User Experience:Auto Drive through Performance Metrics
Mobile User Experience: Auto Drive through Performance MetricsAndreas Grabner
 
DrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every TimeDrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every TimeSuzanne Aldrich
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application DevelopmentKyle Hailey
 
Intuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp tempIntuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp tempRamakrishna Kollipara
 
Jsm computer solutions
Jsm computer solutionsJsm computer solutions
Jsm computer solutionsJason Mast
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'Kyle Hailey
 
Building and Supporting Billion Dollar Ships with JIRA - Greg Warner
Building and Supporting Billion Dollar Ships with JIRA - Greg WarnerBuilding and Supporting Billion Dollar Ships with JIRA - Greg Warner
Building and Supporting Billion Dollar Ships with JIRA - Greg WarnerAtlassian
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapJosh Evans
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
 
Introduction to cypress in Angular (Chinese)
Introduction to cypress in Angular (Chinese)Introduction to cypress in Angular (Chinese)
Introduction to cypress in Angular (Chinese)Hong Tat Yew
 

Similar to Embrace Chaos - Introducing Chaos Engineering to your Organization (20)

Accelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataAccelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual Data
 
AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015
AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015
AUTOMATE 2015 - Is Automation Right for Your Company - Craig Salvalaggio 3-2015
 
Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...
 
MFG4 2016 - Is Automation Right for Your Company - 4-2016
MFG4 2016 -  Is Automation Right for Your Company - 4-2016MFG4 2016 -  Is Automation Right for Your Company - 4-2016
MFG4 2016 - Is Automation Right for Your Company - 4-2016
 
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
 
Best Practices for Web Infrastructure on Amazon Web Services
Best Practices for Web Infrastructure on Amazon Web ServicesBest Practices for Web Infrastructure on Amazon Web Services
Best Practices for Web Infrastructure on Amazon Web Services
 
Performance Tuning in the Trenches
Performance Tuning in the TrenchesPerformance Tuning in the Trenches
Performance Tuning in the Trenches
 
2 speed it powered by microsoft azure
2 speed it powered by microsoft azure2 speed it powered by microsoft azure
2 speed it powered by microsoft azure
 
DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14
 
Mobile User Experience: Auto Drive through Performance Metrics
Mobile User Experience:Auto Drive through Performance MetricsMobile User Experience:Auto Drive through Performance Metrics
Mobile User Experience: Auto Drive through Performance Metrics
 
DrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every TimeDrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every Time
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application Development
 
Effective Scrum
Effective ScrumEffective Scrum
Effective Scrum
 
Intuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp tempIntuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp temp
 
Jsm computer solutions
Jsm computer solutionsJsm computer solutions
Jsm computer solutions
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 
Building and Supporting Billion Dollar Ships with JIRA - Greg Warner
Building and Supporting Billion Dollar Ships with JIRA - Greg WarnerBuilding and Supporting Billion Dollar Ships with JIRA - Greg Warner
Building and Supporting Billion Dollar Ships with JIRA - Greg Warner
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 
Introduction to cypress in Angular (Chinese)
Introduction to cypress in Angular (Chinese)Introduction to cypress in Angular (Chinese)
Introduction to cypress in Angular (Chinese)
 

Recently uploaded

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 

Embrace Chaos - Introducing Chaos Engineering to your Organization

  • 1. Embracing Chaos Introducing Chaos Engineering to your organization
  • 2. “Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production.” -- http://principlesofchaos.org/
  • 3. Introduction • Paul Osman - Senior Engineering Manager • posman@underarmour.com • Previous Lives: PagerDuty, 500px, SoundCloud
  • 4.
  • 5.
  • 6. Game Days: Planned fault injection exercises.
  • 7. Game Days • Imagine what could fail. • Figure out how to prevent it from affecting business, implement that. • Cause the failure scenario to happen in production, hopefully to prove the non effect of the event, thus gaining confidence in the system.
  • 8. Trust • Engineers <> Engineers • Engineers <> Managers • Non-Engineers <> Engineers
  • 9. Engineers <> Engineers This is just a healthy team. A few things I've found build trust on a team: • Embrace failures. Learn from them. • Incident Response Process (STAT) • Practice blame free retrospectives. • Embrace ownership - engineers own alerts.
  • 10. Engineers <> Managers What can managers do to build trust? • Nurture a blame free and just culture. • Protect time for action items.
  • 11. Engineers <> Non-Engineers How about building trust between Engineers and Non- Engineering stakeholders? (i.e. product, executives, customer support, etc) • Metrics that show business impact • Be Transparent about Incidents • Talk loudly about Chaos Engineering
  • 12. Operational Maturity Checklist • Incident Response Process • Blame Free Retrospectives • Action Items • Metrics on Incidents • Talk Loudly about Resiliency
  • 14.
  • 15. Failure Scenarios • Scenario A - Weather HTTP Service Unavailable • Scenario B - Weather MySQL RDS Unavailable • Scenario C - The Weather Channel API - High Latency • Scenario D - Workout Service Unavailable • Scenario E - Weather Async Service Unavailable
  • 16. Failure Scenarios • Scenario A - Weather HTTP Service Unavailable • Scenario B - Weather MySQL RDS Unavailable • Scenario C - The Weather Channel API - High Latency • Scenario D - Workout Service Unavailable • Scenario E - Weather Async Service Unavailable
  • 17. Scenario A - Weather HTTP Service Unavailable • Workout still shown, just without weather • PagerDuty alert? Should fire a low urgency alert
  • 18. Scenario B - Weather MySQL RDS Unavailable • Expected 503s when database down, service was throwing 504 • Had to restart service after database was brought back up - connections were not being recycled
  • 19. Scenario C - High Latency from Weather Channel capability • Requests timeout - should fire low urgency alert • Action item: audit timeouts • Expectation: asynchronous tasks are still processed
  • 20. Take Aways! • We learned a ton! • Scheduled some valuable action items • Just thinking about this stuff was worthwhile • Less alert fatigue! • Let's do more!
  • 21. Next steps • More teams doing more game days more frequently • Build failure injection into our release process (production readiness) • Automate automate automate (hi Gremlin!)
  • 22. Resources • PagerDuty Incident Response Docs https:// response.pagerduty.com/ • Principles of Chaos https://principlesofchaos.org/ • Fault Injection in Production - https://queue.acm.org/ detail.cfm?id=2353017 • Gremlin Blog - https://www.gremlin.com/blog/
  • 23. Thank you! Psst https://careers.underarmour.com/ Or just talk to me: posman@underarmour.com @paulosman