1
Tammy Butow - Principal SRE, Gremlin
Ana Medina - Chaos Engineer, Gremlin
Next Level Chaos Engineering
@tammybutow & @ana_m_medina
3
What is your next level for Chaos Engineering?
@tammybutow & @ana_m_medina
Chaos Engineer @ Gremlin.
Previously Software Engineer / SRE
@ Uber, worked on Chaos
Engineering and Cloud Infrastructure.
Also worked/interned @ SFEFCU,
Google, Quicken Loans, Stanford
University and Miami Dade College.
Ana Medina
Principal SRE @ Gremlin.
Previously SRE Manager @
Dropbox leading Databases, Block
Storage and Code Workflows.
IMOC (Incident Manager On-Call)
for Dropbox.
Also worked @ DigitalOcean, NAB
and QUT.
Tammy Butow
The Why, How & What Of CE
Focus on impact!
@tammybutow & @ana_m_medina
Why Practice Chaos Engineering?
@tammybutow & @ana_m_medina
10x
@tammybutow & @ana_m_medina
100%
@tammybutow & @ana_m_medina
IPO
@tammybutow & @ana_m_medina
Strengthen
New Products
Through
Failure Fridays
@tammybutow & @ana_m_medina
Battle Test
New Cloud Infra
Services Before You
Use Them
@tammybutow & @ana_m_medina
Battle Test
New Versions
Of Cloud Infra
Services Before You
Use Them
@tammybutow & @ana_m_medina
How Do We Practice
Chaos Engineering?
@tammybutow & @ana_m_medina
Chaos Engineering
Tools, Talks
& Guides
@tammybutow & @ana_m_medina
Make Failure Friday
Open To Your Entire
Company
@tammybutow & @ana_m_medina
On-Call Training
With Chaos
Engineering
@tammybutow & @ana_m_medina
Run Chaos
Engineering
Experiments 3x +
A Week Per Service
@tammybutow & @ana_m_medina
What Do We Do To Practice
Chaos Engineering?
@tammybutow & @ana_m_medina
What experiments do we run to practice Chaos Engineering
on new cutting edge software?
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
We have Security Engineering
bug bounty programs...
How can we get better at
creating a culture of doing the
same for reliability vulnerabilities?
Chaos Engineering!
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
@tammybutow & @ana_m_medina
33

Next Level Chaos Engineering - Chaos Conf 2018

Editor's Notes

  • #3 Both of us
  • #4 Ana to talk about this! Reference Kolton’s keynote! Level 0 - Chaos Monkey Level 1 - Infra Failures Level 1.5 - Network Failures Level 2 - Application Failures https://docs.google.com/presentation/d/1uKroG_Hnf-w_VfOXpTdKvPVCE5bEorlj46FSVKYYayU/edit#slide=id.g405a4a4491_1_18
  • #5 ana
  • #6 tammy
  • #7 Tammy
  • #8 Tammy
  • #9 Tammy Achieved a 10x reduction in incidents @ Dropbox using CE
  • #10 Tammy Achieved a 100% reduction in SEV 0s for 12 months @ Dropbox using CE
  • #11 Tammy Achieved a 100% reduction in SEV 0s for 12 months @ Dropbox using CE
  • #12 Ana - ALFI Failure Fridays to strengthen products before launch @ Gremlin
  • #13 Ana New software can dramatically improve reliability, reduce engineering/business/support cost, improve engineering happiness and increase feature velocity New software/tools constantly being rolled out: EKS, AKS etc New versions of software frequently released
  • #14 Ana New software/tools constantly being rolled out: EKS, AKS etc New versions of software frequently released New software can dramatically improve reliability, reduce engineering/business/support cost, improve engineering happiness and increase feature velocity
  • #15 Tammy
  • #16 Tammy
  • #17 Tammy - open culture of chaos
  • #18 Ana
  • #19 Ana
  • #20 Ana
  • #21 Ana
  • #22 Ana Chaos Engineering For Cutting Edge Software
  • #23 tammy
  • #24 tammy
  • #25 Ana
  • #26 Ana
  • #27 tammy
  • #28 tammy
  • #29 tammy
  • #30 Tammy
  • #31 Ana
  • #32 Tammy
  • #33 Ana