Using security to drive chaos engineering - April 2018

Using Security to drive
Chaos Engineering
Dinis Cruz, CISO, Photobox
April 2018

https://pbx-group-security.com

I’m a CISO focused on
securing our client’s Magic moments
by creating secure environments
that enable and accelerate the business
and contribute to the  
top and bottom line

Here are my challenges 
How to make rational risk based decisions
How to create high performance teams
How to scale Security knowledge
How to drive and enable change
How to map data as graphs

We are also hiring :)
 
Head of AppSec 
Head of Cloud Security

Success story
Netflix recommendations

Evolution of Testing
https://www.slideshare.net/NoraJones1/choose-your-own-adventure-qcon-2017-1

Let’s look at a number of
Chaos Engineering
definitions

Chaos Engineering
Building Confidence
in System Behaviour
through experiments

Chaos Engineering is
about trying controlled
changes to observe system
availability deviation

Chaos Engineering is  
carefully injecting harm into  
our systems
to test the system’s ability to
respond to it.

Chaos Engineering  
is the discipline of experimenting on a 
distributed system
in order to build confidence
in the system’s capability to withstand
turbulent conditions in production

(most business friendly)
Chaos Engineering is
limited scope,
continuous,
disaster recovery

1. Start by defining ‘steady state’ as some measurable output of a system that
indicates normal behaviour.
2. Hypothesise that this steady state will continue in both the control group and
the experimental group.
3. Introduce variables that reflect real world events like servers that crash, hard
drives that malfunction, network connections that are severed, etc.
4. Try to disprove the hypothesis by looking for a difference in steady state
between the control group and the experimental group.
Chaos in practice - 4 experiments
http://principlesofchaos.org/

1.Build a Hypothesis around Steady State Behavior
2.Vary Real-world Events
3.Run Experiments in Production
4.Automate Experiments to Run Continuously
5.Minimise Blast Radius
Advanced Principles
http://principlesofchaos.org/

the idea that “Chaos engineering is
not Testing”  
Is caused by  
the failure to make TDD (Test-Driven
development) Scale

TDD Demo
 
1) Real-time 
Test Execution 
2) Real time  
code coverage

Chaos Engineering  
is testing 
the different are the test abstractions
and the extra random layer

Cyber Security is a ‘change
generation factory’

Developers and Techops are
chaos creators

Security testing (and users)
are chaos creators

I have been called  
Director of chaos :)

The myth of the
singe point of failure
(i.e. attackers only need to run
code and find a weak spot)

Do you understand what is
going on in your network?

Biggest threat is not the issue,
but is not having visibility

When do you know about
security incidents?  
(or changes)

You need to know what the
attackers are  
doing on your system
(and users)

If you don’t know what is on
the pentest report …
you have a bigger problem
(i.e. your SOC should be able to tell you)

Best Security model is one
based on
the attacker making a mistake
(i.e. a change)

Use risks to understand reality
and to make the business
owners responsible for their
decisions

Use Threat Models to
understand how your system
works and to document it

Use tests to replicate
known behaviours, attacks
and simulate changes  
(with and without random events)

Which can also called
Security tests 
(which pass on vulnerable
state and on regression test)

If a server on your cloud is
mining bitcoins
Would you know about it?

If a server or app misbehaves ?
When do you know about it?

If your servers start running
30% slower?
What happens to your apps?

If your servers fails to reboot
after a patch
What happens to your system?

When (not if) you have
malicious or api breaking
dependencies?
How do you know about them?

If 3rd parties are using your
APIs (official or not) to dump
your user’s data (aka Facebook)
Would you know about it?

Properties of resilient
and secure systems

Plan for Failure
Ability to sustain failures

Validate and Sanitise
all requests

Authenticate and Authorise
all requests

Reduce capabilities and
features gracefully

Hostile to
insecure traffic
and
insecure code

Have error budgets  
(from Google SRE)

Are easy to refactor
(make changes with confidence)

Pushes to production  
happen minutes
(fully tested and 100x a day (if needed))

The bigger they
get the faster they go
 
(it is smooth and safe to make changes)

It is not about
test code coverage

What matters is
change coverage

If you make changes
and they are not detected

you are just making
random changes

Basically
you are an
agent of chaos

Every change you make
has to have a respective
test change  
(much better pair programming model)

Chaos Engineering
is modern change
management

Chaos Engineering
is the programatic
introduction of changes

Following from 2017 edition
https://owaspsummit.org

Collaboration @ 16x per day for 5x days

Open Security Summit 2018 - London
https://open-security-summit.org/4th - 8th of June

Using security to drive chaos engineering - April 2018

More Related Content

What's hot

Similar to Using security to drive chaos engineering - April 2018

More from Dinis Cruz

Recently uploaded

Using security to drive chaos engineering - April 2018