Using Security to drive
Chaos Engineering
Dinis Cruz, CISO
22 Feb 2018
What is chaos
engineering
Chaos Engineering
Building Confidence
in System Behaviour
through experiments
Chaos Engineering is
Evolution of Testing
https://www.slideshare.net/NoraJones1/choose-your-own-adventure-qcon-2017-1
Chaos Engineering is
about trying controlled
changes to observe system
availability deviation
Chaos Engineering is
specifically about Availability
Chaos Engineering is
carefully injecting harm into
our systems to test the system’s
ability to respond to it.
Chaos Engineering
is the discipline of experimenting on a
distributed system in order to build
confidence in the system’s capability
to withstand turbulent conditions in
production.
Chaos Engineering is
Limited scope, continuous,
disaster recovery
Back to Chaos
Engineering
1. Start by defining ‘steady state’ as some measurable output of a system that
indicates normal behavior.
2. Hypothesize that this steady state will continue in both the control group and
the experimental group.
3. Introduce variables that reflect real world events like servers that crash, hard
drives that malfunction, network connections that are severed, etc.
4. Try to disprove the hypothesis by looking for a difference in steady state
between the control group and the experimental group.
Chaos in practice - 4 experiments
http://principlesofchaos.org/
1.Build a Hypothesis around Steady State Behavior
2.Vary Real-world Events
3.Run Experiments in Production
4.Automate Experiments to Run Continuously
5.Minimize Blast Radius
Advanced Principles
http://principlesofchaos.org/
the idea that “Chaos
engineering is not Testing”


is caused by 

the TDD tragedy
Chaos Engineering
is testing



the different are the test abstractions
and the extra random layer
Security
Cyber Security is a ‘change
generation factory’
Security testing are chaos
creators
I have been called 

Director of chaos :)
Do you understand what is
going on in your network?
Biggest threat is not the issue,
but is not having visibility
When do you know about
security incidents? 

(or changes)
You need to know what the
attackers are doing on your
system
If you don’t know what is on
the pentest report, you have a
big problem
(your SOC team should be able to tell you)
Best Security model is one
based on the attacker making
a mistake (i.e. a change)
Use risks to understand reality
and to make the business
owners responsible for their
decisions
Use Threat Models to
understand how your system
works and to document it
Use tests to replicate known
behaviours and simulate
changes 

(with and without random events)
Properties of resilient
and secure systems
Availability
Plan for Failure
Ability to sustain failures
Validate and Sanitise
all requests
Authenticate and Authorise
all requests
Reduce capabilities and
features gracefully
Hostile to
insecure traffic
and
insecure code
Have error budgets 

(from Google SRE)
Are easy to change
Are easy to refactor
(make changes with confidence)
Pushes to production 

happen minutes
(fully tested and 100x a day (if needed))
The bigger they
get the faster they go


(it is smooth and safe to make changes)
Have 99% change coverage
Change coverage
It is not about
test code coverage
What matters is
change coverage
If you make changes
and they are not detected
you are just making
random changes
you are an
agent of chaos
Every change you make
has to have a respective
test change 

(much better pair programming model)
Some scenarios
If a server or app misbehaves
when do you know about it?
If your servers start running
30% slower what happens to
your application?
If your servers fail to reboot
after a patch, what happens
to your system?
Malicious or api breaking
NuGet package
If a server on your cloud is
mining bitcoins would you
know about it
is chaos = change?
Chaos Engineering
is modern change
management
Chaos Engineering
is the programatic
introduction of changes
https://en.wikipedia.org/wiki/Change_management_(engineering)
should it then be called
Digital Change Engineering?
one more thing
We are recruiting at Photobox 

Group Security
Any questions
@DinisCruz

Using security to drive chaos engineering