Introduction to
Chaos Engineering
Kolton Andrus
CEO and Co-Founder
Gremlin
J. Crew Website down during Black
Friday 2018 sales Newsweek, 11/23/18
Alaska Airlines Flights halted nationwide due
to outage in Seattle Business Insider, 1/6/19
Slack Outage Connectivity issues
hit workplaces WSJ, 6/27/18
What is Chaos Engineering?
Inject something harmful to
build an immunity.
Chaos Engineering
Thoughtful, planned experiments
designed to reveal the
weakness in our systems.
We test proactively, instead of
waiting for an outage.
Distributed Systems
are fragile.
THE PROBLEM
INTERNETDATA CENTER DATA CENTER
BUSINESS LOGICWEB SERVER DATABASE
The “Old” way
Kolton Andrus
CO-FOUNDER & CEO
Matthew Fornaciari
CO-FOUNDER & CTO
We’ve done this before.
How?
Design
Thoughtful
Experiments
Chaos Experiment | Example
Hypothesis (Expected Outcome) No customer impact expected
Attack Condition
Duration: 600s (10 min)
Latency: 400ms
Targets: 50% of available instances
Result (Actual Outcome)
1.2 to 1.6 second latency
Degraded user experience
Returning cached data with 200s
“That’s a real miss in alerting and metrics.”
Slow Response from Database Primary
THE BEGINNING
Chaos Monkey
Level 0
MATURITY REQUIRED
Low
APPROACH TAKEN
Random
VALUE PROVIDED
Prepare for host failures in the cloud
THE FIRST STEP
Infrastructure
Failures
Level 1
MATURITY REQUIRED
Basic Operations
APPROACH TAKEN
Disciplined
VALUE PROVIDED
Prepare for host-level failures
Intermediate
Network Failures
Level 1.5
MATURITY REQUIRED
Networking expertise
APPROACH TAKEN
Gameday
VALUE PROVIDED
Prepare for high impact events
Truly comprehensive
fault injection.
Consume Resources
Change the State
Confuse the Network
World-class safety & security.
SUPPORT FOR SSO & MFA NEVER RUN AS ROOT REVERT
Astonishingly simple
and easy to use.
Intuitive CLI
Elegant API
Trusted by teams worldwide
Thank you.

An Introduction to Chaos Engineering