SlideShare a Scribd company logo
CHAOS!
Breaking your systems to make them unbreakable
–Barack Obama
“You can’t let your failures define you.
You have to let your failures teach you.”
– Corey Bertram
“Hope is not a strategy.”
http://bit.ly/netflix-5-things
Chaos Monkey
J A S O N Y E E
Te c h n i c a l E v a n g e l i s t
Tr a v e l H a c k e r
W h i s k e y H u n t e r
P o k e m o n Tr a i n e r
Tw : @ g i t b i s e c t
E m : j y e e @ d a t a d o g h q . c o m
D ATA D O G
S a a S - b a s e d o b s e r v a b i l i t y
p l a t f o r m :
• M e t r i c s
• Tr a c e s ( A P M )
• L o g s
• S y n t h e t i c s
Tw : @ d a t a d o g h q
We ’ re h i r i n g :
j o b s . d a t a d o g h q . c o m
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
Don’t be a jerk!
Game Days
90 Minutes
90 Minutes
30 minutes planning.
90 Minutes
30 minutes planning.
50 minutes playing.
90 Minutes
30 minutes planning.
50 minutes playing.
10 minutes reporting.
The Team
The Team
1 subject matter expert
The Team
1 subject matter expert
1 SRE
The Team
1 subject matter expert
1 SRE
Sometimes, 1 junior
engineer/developer
Before
• Schedule it.
Before
• Schedule it.
• Pick tests. Start easy.
Before
• Schedule it.
• Pick tests. Start easy.
• Write down what you expect to happen.
Before
• Schedule it.
• Pick tests. Start easy.
• Write down what you expect to happen.
• Write down your plan if things go wrong.
Before
• Schedule it.
• Pick tests. Start easy.
• Write down what you expect to happen.
• Write down your plan if things go wrong.
• Share your document!
During
• Staging -> Production (off-peak) -> Production (primetime)
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
• Maintain discussion in group chat.
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
• Maintain discussion in group chat.
• Monitor for outages.
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
• Maintain discussion in group chat.
• Monitor for outages.
• Run your test and take notes!
After
• Create cards/tickets to track issues that need work.
After
• Create cards/tickets to track issues that need work.
• Write a summary & key lessons.
After
• Create cards/tickets to track issues that need work.
• Write a summary & key lessons.
• Email to all of engineering.
After
• Create cards/tickets to track issues that need work.
• Write a summary & key lessons.
• Email to all of engineering.
• Celebrate!
🍾🎉🐶
Game Levels
0. Terminate the service. Block access to 1 dependency.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
3. Degrade the environment.
Monitor & alert on Latency, Errors, Traffic, & Saturation
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
3. Degrade the environment.
4. Spike traffic.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
3. Degrade the environment.
4. Spike traffic.
5. Terminate the region/cloud.
Experiment time!
Review
Share!
Plan!
Have fun!
Additional Resources
• systemctl, kill, iptables
Additional Resources
• systemctl, kill, iptables
• Comcast - https://github.com/tylertreat/comcast
Additional Resources
• systemctl, kill, iptables
• Comcast - https://github.com/tylertreat/comcast
• Vegeta - https://github.com/tsenart/vegeta
Additional Resources
• systemctl, kill, iptables
• Comcast - https://github.com/tylertreat/comcast
• Vegeta - https://github.com/tsenart/vegeta
• Kubernetes & Istio -
https://istio.io/docs/tasks/traffic-management/fault-injection/
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: wordpress-virtualservice
spec:
hosts:
- "*"
gateways:
- wordpress-gateway
http:
- route:
- destination:
host: wordpress
fault:
abort:
percentage:
value: 50.0
httpStatus: 500
delay:
percentage:
value: 50.0
fixedDelay: 30s
Additional Resources
• Gremlin - https://www.gremlin.com/
• ChaosToolkit - https://chaostoolkit.org/
Questions?
jason.yee@datadoghq.com
@gitbisect
Slides: bit.ly/cmr-chaos

Documents: bit.ly/cmr-chaos-docs
jason.yee@datadoghq.com
@gitbisect

More Related Content

Similar to Jason Yee - Chaos! - Codemotion Rome 2019

The hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivityThe hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivity
Brainhub
 
Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"
RedHatAgileDay
 
Five Cliches of Online Game Development
Five Cliches of Online Game DevelopmentFive Cliches of Online Game Development
Five Cliches of Online Game Development
iandundore
 
How I failed to build a runbook automation system
How I failed to build a runbook automation systemHow I failed to build a runbook automation system
How I failed to build a runbook automation system
TimothyBonci
 
Hushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoHushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for Echo
Deja vu Security
 
Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
Noah Sussman
 
Software testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbaiSoftware testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbai
vibrantuser
 
Scrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper ProductivityScrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper Productivity
Ron Quartel
 
Secrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slidesSecrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slides
Alan Richardson
 
Angus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsAngus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent Systems
Maritime DevCon
 
Let's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting TimeLet's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting Time
Excella
 
Sharpen your Agile Axe by Brian Sjorber
Sharpen your Agile Axe by Brian SjorberSharpen your Agile Axe by Brian Sjorber
Sharpen your Agile Axe by Brian Sjorber
Washington DC Scrum User Group
 
Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0TonikJDK
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solving
Flowa Oy
 
Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...
vibrantuser
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austin
matthewbrahms
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
Alex Cruise
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)
Dan Milstein
 
Spaghetti gate
Spaghetti gateSpaghetti gate
Spaghetti gate
Jon Bachelor
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
Theo Schlossnagle
 

Similar to Jason Yee - Chaos! - Codemotion Rome 2019 (20)

The hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivityThe hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivity
 
Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"
 
Five Cliches of Online Game Development
Five Cliches of Online Game DevelopmentFive Cliches of Online Game Development
Five Cliches of Online Game Development
 
How I failed to build a runbook automation system
How I failed to build a runbook automation systemHow I failed to build a runbook automation system
How I failed to build a runbook automation system
 
Hushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoHushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for Echo
 
Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
 
Software testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbaiSoftware testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbai
 
Scrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper ProductivityScrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper Productivity
 
Secrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slidesSecrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slides
 
Angus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsAngus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent Systems
 
Let's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting TimeLet's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting Time
 
Sharpen your Agile Axe by Brian Sjorber
Sharpen your Agile Axe by Brian SjorberSharpen your Agile Axe by Brian Sjorber
Sharpen your Agile Axe by Brian Sjorber
 
Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solving
 
Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austin
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)
 
Spaghetti gate
Spaghetti gateSpaghetti gate
Spaghetti gate
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Jason Yee - Chaos! - Codemotion Rome 2019

  • 1. CHAOS! Breaking your systems to make them unbreakable
  • 2. –Barack Obama “You can’t let your failures define you. You have to let your failures teach you.”
  • 3. – Corey Bertram “Hope is not a strategy.”
  • 6. J A S O N Y E E Te c h n i c a l E v a n g e l i s t Tr a v e l H a c k e r W h i s k e y H u n t e r P o k e m o n Tr a i n e r Tw : @ g i t b i s e c t E m : j y e e @ d a t a d o g h q . c o m
  • 7. D ATA D O G S a a S - b a s e d o b s e r v a b i l i t y p l a t f o r m : • M e t r i c s • Tr a c e s ( A P M ) • L o g s • S y n t h e t i c s Tw : @ d a t a d o g h q We ’ re h i r i n g : j o b s . d a t a d o g h q . c o m
  • 8. –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.”
  • 9. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 10. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 11. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 12. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 13. Don’t be a jerk!
  • 17. 90 Minutes 30 minutes planning. 50 minutes playing.
  • 18. 90 Minutes 30 minutes planning. 50 minutes playing. 10 minutes reporting.
  • 20. The Team 1 subject matter expert
  • 21. The Team 1 subject matter expert 1 SRE
  • 22. The Team 1 subject matter expert 1 SRE Sometimes, 1 junior engineer/developer
  • 24. Before • Schedule it. • Pick tests. Start easy.
  • 25. Before • Schedule it. • Pick tests. Start easy. • Write down what you expect to happen.
  • 26. Before • Schedule it. • Pick tests. Start easy. • Write down what you expect to happen. • Write down your plan if things go wrong.
  • 27. Before • Schedule it. • Pick tests. Start easy. • Write down what you expect to happen. • Write down your plan if things go wrong. • Share your document!
  • 28. During • Staging -> Production (off-peak) -> Production (primetime)
  • 29. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat.
  • 30. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat. • Maintain discussion in group chat.
  • 31. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat. • Maintain discussion in group chat. • Monitor for outages.
  • 32. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat. • Maintain discussion in group chat. • Monitor for outages. • Run your test and take notes!
  • 33. After • Create cards/tickets to track issues that need work.
  • 34. After • Create cards/tickets to track issues that need work. • Write a summary & key lessons.
  • 35. After • Create cards/tickets to track issues that need work. • Write a summary & key lessons. • Email to all of engineering.
  • 36. After • Create cards/tickets to track issues that need work. • Write a summary & key lessons. • Email to all of engineering. • Celebrate!
  • 38. Game Levels 0. Terminate the service. Block access to 1 dependency.
  • 39. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies.
  • 40. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host.
  • 41. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host. 3. Degrade the environment. Monitor & alert on Latency, Errors, Traffic, & Saturation
  • 42. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host. 3. Degrade the environment. 4. Spike traffic.
  • 43. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host. 3. Degrade the environment. 4. Spike traffic. 5. Terminate the region/cloud.
  • 47. Additional Resources • systemctl, kill, iptables • Comcast - https://github.com/tylertreat/comcast
  • 48. Additional Resources • systemctl, kill, iptables • Comcast - https://github.com/tylertreat/comcast • Vegeta - https://github.com/tsenart/vegeta
  • 49. Additional Resources • systemctl, kill, iptables • Comcast - https://github.com/tylertreat/comcast • Vegeta - https://github.com/tsenart/vegeta • Kubernetes & Istio - https://istio.io/docs/tasks/traffic-management/fault-injection/
  • 50.
  • 51.
  • 52.
  • 53. apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: wordpress-virtualservice spec: hosts: - "*" gateways: - wordpress-gateway http: - route: - destination: host: wordpress fault: abort: percentage: value: 50.0 httpStatus: 500 delay: percentage: value: 50.0 fixedDelay: 30s
  • 54. Additional Resources • Gremlin - https://www.gremlin.com/ • ChaosToolkit - https://chaostoolkit.org/