Security incident response is a reactive and chaotic exercise. What if it were possible to flip the scenario on its head? Security focused chaos engineering takes the approach of advancing the security incident response apparatus by reversing the postmortem and preparation phases. Contrary to Purple Team or Red Team game days, Security Chaos Engineering does not use threat actor tactics, techniques and procedures. It develops teams through unique configuration, cyber threat and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Security Chaos Engineering allows incident response and product teams to derive new information about the state of security within their distributed systems that was previously unknown. Within this new paradigm of instrumentation where we proactively conduct “Pre-Incident” vs. “Post-Incident” reviews we are now able to more accurately measure how effective our security incident response teams, tools, skills, and procedures are during the manic of the Incident Response function.
In this session Aaron Rinehart, the mind behind the first Open Source Security Chaos Engineering tool ChaoSlingr, will introduce how Security Chaos Engineering can be applied to create highly secure, performant, and resilient distributed systems.
2. 2
About A.A.Ron
● CTO of Stealthy Startup
● Former Chief Security Architect
@UnitedHealth responsible for security
engineering strategy
● Led the DevOps and Open Source
Transformation at UnitedHealth Group
● Former (DOD, NASA, DHS, CollegeBoard )
● Frequent speaker and author on Chaos
Engineering & Security
● Pioneer behind Security Chaos Engineering
● Led ChaoSlingr team at UnitedHealth
6. 6
About A.A.Ron
● CTO of Stealthy Startup
● Former Chief Security Architect
@UnitedHealth responsible for security
engineering strategy
● Led the DevOps and Open Source
Transformation at UnitedHealth Group
● Former (DOD, NASA, DHS, CollegeBoard )
● Frequent speaker and author on Chaos
Engineering & Security
● Pioneer behind Security Chaos Engineering
● Led ChaoSlingr team at UnitedHealth
27. “Chaos Engineering is the discipline of
experimenting on a distributed system
in order to build confidence in the
system’s ability to withstand turbulent
conditions”
43. Proactively Manage &
Measure Validate Runbooks
Measure Team Skills
Determine Control
Effectiveness
Learn new insights into
system behavior
Transfer knowledge
Build a learning culture
45. Security Crayon Differences
Noisy distributed system behavior
Not geared for Cascading Events
Point-in-time even if Automated
Performed by Security Teams with
Specialized skill sets
46. Security Chaos Differences
Distributed Systems Focus
Goal: Experimentation
Human Factors focused
Small Isolated Scope
Focus on Cascading Events
Performed by Mixed Engineering
Teams in Gameday
During business hours
58. What is the system actually doing?
Has it done this before?
59. What is the system actually doing?
Has it done this before?
Why is it behaving that way?
60. What is the system actually doing?
Has it done this before?
Why is it behaving that way?
What is it supposed to do next?
61. What is the system actually doing?
Has it done this before?
Why is it behaving that way?
What is it supposed to do next?
How did it get into this state?
65. • ChatOps Integration
• Configuration-as-Code
• Example Code & Open Framework
ChaoSlingr Product Features
• Serverless App in AWS
• 100% Native AWS
• Configurable Operational Mode &
Frequency
• Opt-In | Opt-Out Model
66. Hypothesis: If someone accidentally or
maliciously introduced a misconfigured
port then we would immediately detect,
block, and alert on the event.
Alert
SOC?
Config
Mgmt?
Misconfigured
Port Injection
IR
Triage
Log
data?
Wait...
Firewall?
67. Result: Hypothesis disproved. Firewall did not detect
or block the change on all instances. Standard Port
AAA security policy out of sync on the Portal Team
instances. Port change did not trigger an alert and
log data indicated successful change audit.
However we unexpectedly learned the configuration
mgmt tool caught change and alerted the SoC.
Alert
SOC?
Config
Mgmt?
Misconfigured
Port Injection
IR
Triage
Log
data?
Wait...
Firewall?
68. More Experiment Examples
● Internet exposed
Kubernetes API
● Unauthorized Bad
Container Repo
● Unencrypted S3 Bucket
● Disable MFA
● Bad AWS Automated Block
Rule
● Software Secret Clear
Text Disclosure
● Permission collision in
Shared IAM Role Policy
● Disabled Service Event
Logging
● Introduce Latency on
Security Controls
● API Gateway Shutdown