If we do this one thing, it will
What’s the point if this is
just going to happen?
“Failure Happens. We’ll just
have to design for it.”
Successful companies say:
“Design For Failure”
“Healthy attitude about Failure”
“Resilient (to Failure)”
THE OUTAGES WILL CONTINUE
UNTIL THE APPROACH
Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-ﬂickr
An exercise designed to increase
Resilience through large-scale fault
injection across critical systems.
Part of a larger discipline called
Not new, just new to us ;-)
Resilience is a product of
People & Technology
GameDay increases Resilience in 3 ways
‣ Identification and mitigation of risks and impact from
‣ Reduces frequency of failure (MTBF)
‣ Reduces duration of recovery (MTTR)
‣ Builds confidence & competence responding to failure
and under stress.
‣ Strengthens individual and cultural ability to anticipate,
mitigate, respond to, and recover from failures of all
‣ Trigger and expose “latent defects”
‣ Choose when discover them, instead of letting that be
determined by the next real disaster.