SlideShare a Scribd company logo
0/40
Automated Intrusion Response
CDIS Spring Conference
Kim Hammar
kimham@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
May 22, 2024
1/40
Use Case: Intrusion Response
I A defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
2/40
Automated Intrusion Response
Levels of security automation
No automation.
Manual detection.
Manual prevention.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Audit logs
Manual detection.
Manual prevention.
Partial automation.
Manual configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
2/40
Automated Intrusion Response
Levels of security automation
No automation.
Manual detection.
Manual prevention.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Audit logs
Manual detection.
Manual prevention.
Partial automation.
Manual configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
Can we find effective security strategies through decision-theoretic methods?
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Model estimation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Model estimation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/40
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
4/40
Creating a Digital Twin of the Target Infrastructure
Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Digital Twins
R1 R1 R1
I Given an infrastructure configuration, our framework
automates the creation of a digital twin.
I The configuration space defines the class of infrastructures
that we can emulate.
5/40
Example Infrastructure Configuration
I 64 nodes
I 24 ovs switches
I 3 gateways
I 6 honeypots
I 8 application servers
I 4 administration servers
I 15 compute servers
I 11 vulnerabilities
I cve-2010-0426
I cve-2015-3306
I etc.
I Management
I 1 sdn controller
I 1 Kafka server
I 1 elastic server
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
6/40
Emulating Physical Components
Containers
Physical server
Operating system
Docker engine
Our framework
I We emulate physical components with Docker containers
I Focus on linux-based systems
I Our framework provides the orchestration layer
7/40
Emulating Network Connectivity
Management node 1
Emulated IT infrastructure
Management node 2
Emulated IT infrastructure
Management node n
Emulated IT infrastructure
VXLAN VXLAN . . . VXLAN
IP network
I We emulate network connectivity on the same host using
network namespaces
I Connectivity across physical hosts is achieved using VXLAN
tunnels with Docker swarm
8/40
Emulating Network Conditions
I Traffic shaping using NetEm
I Allows to configure:
I Delay
I Capacity
I Packet Loss
I Jitter
I Queueing delays
I etc.
User space
. . .
Application processes
Kernel
TCP/UDP
IP/Ethernet/802.11
OS
TCP/IP
stack
Queueing
discipline
Device driver
queue (FIFO)
NIC
Netem config:
latency,
jitter, etc.
Sockets
9/40
Emulating Clients Client population
. . .
Arrival rate λ Departure
Service time µ
. . .
.
.
.
.
.
.
.
.
.
w1 w2 w|W|
Workflows (Markov processes)
I Homogeneous client population
I Clients arrive according to Po(λ)
I Client service times Exp(µ)
I Service dependencies (St)t=1,2,... ∼ mc
10/40
Emulating The Attacker and The Defender
I API for automated
defender and attacker
actions
I Attacker actions:
I Exploits
I Reconnaissance
I Pivoting
I etc.
I Defender actions:
I Shut downs
I Redirect
I Isolate
I Recover
I Migrate
I etc.
Markov Decision Process
s1,1 s1,2 s1,3 . . . s1,4
s2,1 s2,2 s2,3 . . . s2,4
Digital Twin
. . .
Virtual
network
Virtual
devices
Emulated
services
Emulated
actors
IT Infrastructure
Configuration
& change events
System traces
Verified security policy
Optimized security policy
11/40
Software framework
Metastore
Python libraries
Management api
rest api cli
grpc
Management api
Digital twins
I More details about the software framework
I Source code: https://github.com/Limmen/csle
I Documentation: http://limmen.dev/csle/
I Demo: https://www.youtube.com/watch?v=iE2KPmtIs2A
I Installation:
https://www.youtube.com/watch?v=l_g3sRJwwhc
12/40
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
13/40
System Model
H C
∅
Crashed
Healthy Compromised
Model
complexity
Static attacker
Small set of responses
Dynamic attacker
Small set of responses
Dynamic attacker
Large set of responses
I Intrusion response can be modeled in many ways
I As a parametric optimization problem
I As an optimal stopping problem
I As a dynamic program
I As a game
I etc.
14/40
Related Work on Learning Automated Intrusion Response
External validity
model
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022-2023
Our work 2020-2023
15/40
Intrusion Response through Optimal Stopping
I Suppose
I The attacker follows a fixed strategy (no adaptation)
I We only have one response action, e.g., block the gateway
I Formulate intrusion response as optimal stopping
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
affect the intrusion
Episode
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
16/40
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
When to take a defensive action?
17/40
The Defender’s Optimal Stopping Problem (1/3)
I Infrastructure is a discrete-time dynamical system (st)T
t=1
I Defender observes a noisy observation process (ot)T
t=1
I Two options at each time t: (C)ontinue and (S)stop
I Find the optimal stopping time τ?:
τ?
∈ arg max
τ
Eτ
" τ−1
X
t=1
γt−1
RC
st st+1
+ γτ−1
RS
sτ sτ
#
where RS
ss0 & RC
ss0 are the stop/continue rewards and τ is
τ = inf{t : t > 0, at = S}
t
ot
τ∗
18/40
The Defender’s Optimal Stopping Problem (2/3)
I Objective: stop the attack as soon as possible
I Let the state space be S = {H, C, ∅}
H C
∅
Stopped
Healthy Compromised
19/40
The Defender’s Optimal Stopping Problem (3/3)
I Let the observation process (ot)T
t=1 represent ids alerts
0 2000 4000 6000 8000
probability
cve-2010-0426
0 2000 4000 6000 8000
cve-2015-3306
0 2000 4000 6000 8000
cve-2015-5602
0 2000 4000 6000 8000
cve-2016-10033
0 2000 4000 6000 8000
cwe-89
0 2000 4000 6000 8000
O
probability
cve-2017-7494
0 2000 4000 6000 8000
O
cve-2014-6271
0 5000 10000 15000 20000
O
ftp brute force
0 5000 10000 15000 20000
O
ssh brute force
0 5000 10000 15000 20000
O
telnet brute force
intrusion no intrusion intrusion model normal operation model
I Estimate the observation distribution based on M samples
from the twin
I E.g., compute empirical distribution b
Z as estimate of Z
I b
Z →a.s Z as M → ∞ (Glivenko-Cantelli theorem)
20/40
Optimal Stopping Strategy
I The defender can compute the belief
bt , P[St = C | b1, o1, o2, . . . ot]
I Stopping strategy:
π(b) : [0, 1] → {S, C}
Defender
Belief
20/40
Optimal Threshold Strategy
Theorem
There exists an optimal defender strategy of the form:
π?
(b) = S ⇐⇒ b ≥ α?
α?
∈ [0, 1]
i.e., the stopping set is S = [α?, 1]
b
0 1
belief space B = [0, 1]
S1
α?
1
21/40
Optimal Multiple Stopping
I Suppose the defender can take L ≥ 1 response actions
I Find the optimal stopping times τ?
L , τ?
L−1, . . . , τ?
1 :
(τ?
l )l=1,...,L ∈ arg max
τ1,...,τL
Eτ1,...,τL
" τL−1
X
t=1
γt−1
RC
st st+1
+ γτL−1
RS
sτL
sτL
+
τL−1−1
X
t=τL+1
γt−1
RC
st st+1
+ γτL−1−1
RS
sτL−1
sτL−2
+ . . . +
τ1−1
X
t=τ2+1
γt−1
RC
st st+1
+ γτ1−1
RS
sτ1 sτ1
#
where τl denotes the stopping time with l stops remaining.
t
ot
τ?
L τ?
L−1 τ?
L−2 . . .
22/40
Optimal Multi-Threshold Strategy
Theorem
I Stopping sets are nested Sl−1 ⊆ Sl for l = 2, . . . L.
I If (ot)t≥1 is totally positive of order 2 (TP2), there exists an
optimal defender strategy of the form:
π?
l (b) = S ⇐⇒ b ≥ α?
l , l = 1, . . . , L
where α?
l ∈ [0, 1] is decreasing in l.
b
0 1
belief space B = [0, 1]
S1
S2
.
.
.
SL
α?
1
α?
2
α?
L
. . .
23/40
Optimal Stopping Game
I Suppose the attacker is dynamic and decides when to start
and abort its intrusion.
Attacker
Defender
t = 1
t = T
τ1,1 τ1,2 τ1,3
τ2,1
t
Stopped
Game episode
Intrusion
I Find the optimal stopping times
maximize
τD,1,...,τD,L
minimize
τA,1,τA,2
E[J]
where J is the defender’s objective.
24/40
Best-Response Multi-Threshold Strategies (1/2)
Theorem
I The defender’s best response is of the form:
π̃D,l (b) = S ⇐⇒ b ≥ α̃l , l = 1, . . . , L
I The attacker’s best response is of the form:
π̃A,l (b) = C ⇐⇒ π̃D,l (S | b) ≥ β̃H,l , l = 1, . . . , L, s = H
π̃A,l (b) = S ⇐⇒ π̃D,l (S | b) ≥ β̃C,l , l = 1, . . . , L, s = C
25/40
Best-Response Multi-Threshold Strategies (2/2)
b
Defender
0 1
S
(D)
1,π2,l
S
(D)
2,π2,l
.
.
.
S
(D)
L,π2,l
α̃1
α̃2
α̃L . . .
b
Attacker
0 1
S
(A)
C,1,πD,l
S
(A)
C,L,πD,l
β̃C,1
β̃C,L
β̃H,1
. . . β̃H,L
. . .
S
(A)
H,1,πD,l
S
(A)
H,L,πD,l
26/40
Efficient Computation of Best Responses
Algorithm 1: Threshold Optimization
1 Input: Objective function J, number of thresholds L,
parametric optimizer PO
2 Output: A approximate best response strategy π̂θ
3 Algorithm
4 Θ ← [0, 1]L
5 For each θ ∈ Θ, define πθ(bt) as
6 πθ(bt) ,
(
S if bt ≥ θi
C otherwise
7 Jθ ← Eπθ
[J]
8 π̂θ ← PO(Θ, Jθ)
9 return π̂θ
I Examples of parameteric optimization algorithmns: CEM, BO,
CMA-ES, DE, SPSA, etc.
27/40
Threshold-Fictitious Play to Approximate an Equilibrium
π̃A ∈ BA(πD)
πA
πD
π̃D ∈ BD(πA)
π̃0
A ∈ BA(π0
D)
π0
A
π0
D
π̃0
D ∈ BD(π0
A)
. . .
π?
A ∈ BA(π?
D)
π?
D ∈ BD(π?
A)
Fictitious play: iterative averaging of best responses.
I Learn best response strategies iteratively
I Average best responses to approximate the equilibrium
28/40
Comparison against State-of-the-art Algorithms
0 10 20 30 40 50 60
training time (min)
0
50
100
Reward per episode against Novice
0 10 20 30 40 50 60
training time (min)
Reward per episode against Experienced
0 10 20 30 40 50 60
training time (min)
Reward per episode against Expert
PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
29/40
Learning Curves in Simulation and Digital Twin
0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0.0
0.5
1.0
P[intrusion interrupted]
0.0
0.5
1.0
1.5
P[early stopping]
5
10
Duration of intrusion
−50
0
50
100
experienced
0
50
100
150
0.0
0.5
1.0
0.0
0.5
1.0
1.5
0
5
10
0 20 40 60
training time (min)
−50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0.0
0.5
1.0
0 20 40 60
training time (min)
0.0
0.5
1.0
1.5
2.0
0 20 40 60
training time (min)
0
5
10
15
20
πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
−5.0
−2.5
0.0
2.5
5.0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
29/40
Learning Curves in Simulation and Digital Twin
0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0.0
0.5
1.0
P[intrusion interrupted]
0.0
0.5
1.0
1.5
P[early stopping]
5
10
Duration of intrusion
−50
0
50
100
experienced
0
50
100
150
0.0
0.5
1.0
0.0
0.5
1.0
1.5
0
5
10
0 20 40 60
training time (min)
−50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0.0
0.5
1.0
0 20 40 60
training time (min)
0.0
0.5
1.0
1.5
2.0
0 20 40 60
training time (min)
0
5
10
15
20
πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
−5.0
−2.5
0.0
2.5
5.0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
Stopping is about timing; now we consider timing + action selection
30/40
General Intrusion Response Game
I Suppose the defender and the attacker
can take L actions per node
I G = h{gw} ∪ V, Ei: directed tree
representing the virtual infrastructure
I V: set of virtual nodes
I E: set of node dependencies
I Z: set of zones
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
30/40
General Intrusion Response Game
I Suppose the defender and the attacker
can take L actions per node
I G = h{gw} ∪ V, Ei: directed tree
representing the virtual infrastructure
I V: set of virtual nodes
I E: set of node dependencies
I Z: set of zones
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
31/40
State Space
I Each i ∈ V has a state
vi,t = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
31/40
State Space
I Each i ∈ V has a state
vi,t = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
31/40
State Space
I Each i ∈ V has a state
vi,t = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
32/40
Observations
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
32/40
Observations
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
33/40
probability
ZO1
ZO2
ZO3
ZO4
ZO5
ZO6
ZO7
ZO8
probability
ZO9
ZO10
ZO11
ZO12
ZO13
ZO14
ZO15
ZO16
probability
ZO17
ZO18
ZO19
ZO20
ZO21
ZO22
ZO23
ZO24
probability
ZO25
ZO26
ZO27
ZO28
ZO29
ZO30
ZO31
ZO32
probability
ZO33
ZO34
ZO35
ZO36
ZO37
ZO38
ZO39
ZO40
probability
ZO41
ZO42
ZO43
ZO44
ZO45
ZO46
ZO47
ZO48
probability
ZO49
ZO50
ZO51
ZO52
ZO53
ZO54
ZO55
ZO56
250 500 750
O
probability
ZO57
250 500 750
O
ZO58
250 500 750
O
ZO59
250 500 750
O
ZO60
250 500 750
O
ZO61
250 500 750
O
ZO62
250 500 750
O
ZO63
250 500 750
O
ZO64
Distributions of # alerts weighted by priority ZOi
(Oi | S
(D)
i , A
(A)
i ) per node i ∈ V
no intrusion intrusion
34/40
The (General) Intrusion Response Problem
maximizeπD∈ΠD
minimizeπA∈ΠA
E(πD,πA) [J]
E(πD,πA) denotes the expectation of the random vectors
(St, Ot, At)t∈{1,...,T} when following the strategy profile (πD, πA)
35/40
The Curse of Dimensionality
I Solving the game is computationally intractable. The state,
action, and observation spaces of the game grow
exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
35/40
The Curse of Dimensionality
I While (1) has a solution (i.e the game Γ has a value (Thm
1)), computing it is intractable since the state, action, and
observation spaces of the game grow exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
We tackle the scability challenge with decomposition
36/40
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
Does not directly
depend on the state or
action of a node
down here
36/40
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
But they are
not completely
independent either.
How can we
exploit this
structure?
Does not directly
depend on the state
or action of a node
down here
37/40
Scalable Learning through Decomposition
1 2 3 4 5 6 7 8 9 10
2
4
6
8
10
linear
measured
# parallel processes n
|V| = 10
Speedup
S
n
Speedup of best response computation for the decomposed game; Tn
denotes the completion time with n processes; the speedup is calculated
as Sn = T1
Tn
; the error bars indicate standard deviations from 3
measurements.
38/40
Learning Equilibrium Strategies
0 20 40 60 80 100
running time (h)
0
5
b
δ = 0.4
Approximate exploitability b
δ
0 20 40 60 80 100
running time (h)
0.0
0.5
1.0
Defender utility per episode
dfsp simulation dfsp digital twin upper bound oi,t  0 random defense
Learning curves obtained during training of dfsp to find optimal
(equilibrium) strategies in the intrusion response game; red and blue
curves relate to dfsp; black, orange and green curves relate to baselines.
39/40
Comparison with NFSP
0 10 20 30 40 50 60 70 80
running time (h)
0.0
2.5
5.0
7.5
Approximate exploitability
dfsp nfsp
Learning curves obtained during training of dfsp and nfsp to find
optimal (equilibrium) strategies in the intrusion response game; the red
curve relate to dfsp and the purple curve relate to nfsp; all curves
show simulation results.
40/40
Conclusions
I We develop a framework to
automatically learn security strategies.
I We apply the framework to an
intrusion response use case.
I We derive properties of optimal
security strategies.
I We evaluate strategies on a digital
twin.
I Questions → demonstration
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation 
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation 
Learning

More Related Content

Similar to Automated Intrusion Response - CDIS Spring Conference 2024

Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
Kim Hammar
 
AI for Cybersecurity Innovation
AI for Cybersecurity InnovationAI for Cybersecurity Innovation
AI for Cybersecurity Innovation
Pete Burnap
 
Research of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortResearch of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on Snort
Francis Yang
 
AktaionPPTv5_JZedits
AktaionPPTv5_JZeditsAktaionPPTv5_JZedits
AktaionPPTv5_JZedits
Rod Soto
 
Intrusion Detection with Neural Networks
Intrusion Detection with Neural NetworksIntrusion Detection with Neural Networks
Intrusion Detection with Neural Networks
antoniomorancardenas
 
Security operation center (SOC)
Security operation center (SOC)Security operation center (SOC)
Security operation center (SOC)
Ahmed Ayman
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
Kim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
Kim Hammar
 
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
Osama M. Khaled
 
An efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptxAn efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptx
Sandeep Maurya
 
01_Metasploit - The Elixir of Network Security
01_Metasploit - The Elixir of Network Security01_Metasploit - The Elixir of Network Security
01_Metasploit - The Elixir of Network Security
Harish Chaudhary
 
Situational awareness for computer network security
Situational awareness for computer network securitySituational awareness for computer network security
Situational awareness for computer network security
mmubashirkhan
 
Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search
Automated Vulnerability Testing Using Machine Learning and Metaheuristic SearchAutomated Vulnerability Testing Using Machine Learning and Metaheuristic Search
Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search
Lionel Briand
 
Understanding Cyber Kill Chain and OODA loop
Understanding Cyber Kill Chain and OODA loopUnderstanding Cyber Kill Chain and OODA loop
Understanding Cyber Kill Chain and OODA loop
David Sweigert
 
CA_Module_2.pdf
CA_Module_2.pdfCA_Module_2.pdf
CA_Module_2.pdf
EhabRushdy1
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
Rod Soto
 
Ai ml dl_bct and mariners
Ai  ml dl_bct and marinersAi  ml dl_bct and mariners
Ai ml dl_bct and mariners
cmmindia2017
 
Ai ml dl_bct and mariners-1
Ai  ml dl_bct and mariners-1Ai  ml dl_bct and mariners-1
Ai ml dl_bct and mariners-1
cmmindia2017
 
Ai ml dl_bct and mariners
Ai  ml dl_bct and marinersAi  ml dl_bct and mariners
Ai ml dl_bct and mariners
cmmindia2017
 

Similar to Automated Intrusion Response - CDIS Spring Conference 2024 (20)

Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
 
AI for Cybersecurity Innovation
AI for Cybersecurity InnovationAI for Cybersecurity Innovation
AI for Cybersecurity Innovation
 
Research of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortResearch of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on Snort
 
AktaionPPTv5_JZedits
AktaionPPTv5_JZeditsAktaionPPTv5_JZedits
AktaionPPTv5_JZedits
 
Intrusion Detection with Neural Networks
Intrusion Detection with Neural NetworksIntrusion Detection with Neural Networks
Intrusion Detection with Neural Networks
 
Security operation center (SOC)
Security operation center (SOC)Security operation center (SOC)
Security operation center (SOC)
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
 
An efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptxAn efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptx
 
01_Metasploit - The Elixir of Network Security
01_Metasploit - The Elixir of Network Security01_Metasploit - The Elixir of Network Security
01_Metasploit - The Elixir of Network Security
 
Situational awareness for computer network security
Situational awareness for computer network securitySituational awareness for computer network security
Situational awareness for computer network security
 
Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search
Automated Vulnerability Testing Using Machine Learning and Metaheuristic SearchAutomated Vulnerability Testing Using Machine Learning and Metaheuristic Search
Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search
 
Understanding Cyber Kill Chain and OODA loop
Understanding Cyber Kill Chain and OODA loopUnderstanding Cyber Kill Chain and OODA loop
Understanding Cyber Kill Chain and OODA loop
 
CA_Module_2.pdf
CA_Module_2.pdfCA_Module_2.pdf
CA_Module_2.pdf
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Ai ml dl_bct and mariners
Ai  ml dl_bct and marinersAi  ml dl_bct and mariners
Ai ml dl_bct and mariners
 
Ai ml dl_bct and mariners-1
Ai  ml dl_bct and mariners-1Ai  ml dl_bct and mariners-1
Ai ml dl_bct and mariners-1
 
Ai ml dl_bct and mariners
Ai  ml dl_bct and marinersAi  ml dl_bct and mariners
Ai ml dl_bct and mariners
 

More from Kim Hammar

Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Kim Hammar
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
Kim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Kim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
Kim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
Kim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
Kim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
Kim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
Kim Hammar
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Kim Hammar
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
Kim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
Kim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
Kim Hammar
 
Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
Kim Hammar
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
Kim Hammar
 

More from Kim Hammar (20)

Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
 

Recently uploaded

ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 

Recently uploaded (20)

ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 

Automated Intrusion Response - CDIS Spring Conference 2024

  • 1. 0/40 Automated Intrusion Response CDIS Spring Conference Kim Hammar kimham@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology May 22, 2024
  • 2. 1/40 Use Case: Intrusion Response I A defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I Has partial observability I An attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 3. 2/40 Automated Intrusion Response Levels of security automation No automation. Manual detection. Manual prevention. Lack of tools. 1980s 1990s 2000s-Now Research Operator assistance. Audit logs Manual detection. Manual prevention. Partial automation. Manual configuration. Intrusion detection systems. Intrusion prevention systems. High automation. System automatically updates itself.
  • 4. 2/40 Automated Intrusion Response Levels of security automation No automation. Manual detection. Manual prevention. Lack of tools. 1980s 1990s 2000s-Now Research Operator assistance. Audit logs Manual detection. Manual prevention. Partial automation. Manual configuration. Intrusion detection systems. Intrusion prevention systems. High automation. System automatically updates itself. Can we find effective security strategies through decision-theoretic methods?
  • 5. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 6. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 7. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Model estimation Automation & Self-learning systems
  • 8. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Model estimation Automation & Self-learning systems
  • 9. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 10. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation Automation & Self-learning systems
  • 11. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 12. 3/40 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 13. 4/40 Creating a Digital Twin of the Target Infrastructure Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Digital Twins R1 R1 R1 I Given an infrastructure configuration, our framework automates the creation of a digital twin. I The configuration space defines the class of infrastructures that we can emulate.
  • 14. 5/40 Example Infrastructure Configuration I 64 nodes I 24 ovs switches I 3 gateways I 6 honeypots I 8 application servers I 4 administration servers I 15 compute servers I 11 vulnerabilities I cve-2010-0426 I cve-2015-3306 I etc. I Management I 1 sdn controller I 1 Kafka server I 1 elastic server r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 15. 6/40 Emulating Physical Components Containers Physical server Operating system Docker engine Our framework I We emulate physical components with Docker containers I Focus on linux-based systems I Our framework provides the orchestration layer
  • 16. 7/40 Emulating Network Connectivity Management node 1 Emulated IT infrastructure Management node 2 Emulated IT infrastructure Management node n Emulated IT infrastructure VXLAN VXLAN . . . VXLAN IP network I We emulate network connectivity on the same host using network namespaces I Connectivity across physical hosts is achieved using VXLAN tunnels with Docker swarm
  • 17. 8/40 Emulating Network Conditions I Traffic shaping using NetEm I Allows to configure: I Delay I Capacity I Packet Loss I Jitter I Queueing delays I etc. User space . . . Application processes Kernel TCP/UDP IP/Ethernet/802.11 OS TCP/IP stack Queueing discipline Device driver queue (FIFO) NIC Netem config: latency, jitter, etc. Sockets
  • 18. 9/40 Emulating Clients Client population . . . Arrival rate λ Departure Service time µ . . . . . . . . . . . . w1 w2 w|W| Workflows (Markov processes) I Homogeneous client population I Clients arrive according to Po(λ) I Client service times Exp(µ) I Service dependencies (St)t=1,2,... ∼ mc
  • 19. 10/40 Emulating The Attacker and The Defender I API for automated defender and attacker actions I Attacker actions: I Exploits I Reconnaissance I Pivoting I etc. I Defender actions: I Shut downs I Redirect I Isolate I Recover I Migrate I etc. Markov Decision Process s1,1 s1,2 s1,3 . . . s1,4 s2,1 s2,2 s2,3 . . . s2,4 Digital Twin . . . Virtual network Virtual devices Emulated services Emulated actors IT Infrastructure Configuration & change events System traces Verified security policy Optimized security policy
  • 20. 11/40 Software framework Metastore Python libraries Management api rest api cli grpc Management api Digital twins I More details about the software framework I Source code: https://github.com/Limmen/csle I Documentation: http://limmen.dev/csle/ I Demo: https://www.youtube.com/watch?v=iE2KPmtIs2A I Installation: https://www.youtube.com/watch?v=l_g3sRJwwhc
  • 21. 12/40 System Identification s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 22. 13/40 System Model H C ∅ Crashed Healthy Compromised Model complexity Static attacker Small set of responses Dynamic attacker Small set of responses Dynamic attacker Large set of responses I Intrusion response can be modeled in many ways I As a parametric optimization problem I As an optimal stopping problem I As a dynamic program I As a game I etc.
  • 23. 14/40 Related Work on Learning Automated Intrusion Response External validity model complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets) Xiao et al. 2021. (RL approach to APT) etc. 2022-2023 Our work 2020-2023
  • 24. 15/40 Intrusion Response through Optimal Stopping I Suppose I The attacker follows a fixed strategy (no adaptation) I We only have one response action, e.g., block the gateway I Formulate intrusion response as optimal stopping Intrusion event time-step t = 1 Intrusion ongoing t t = T Early stopping times Stopping times that affect the intrusion Episode
  • 25. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 26. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 27. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 28. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 29. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 30. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 31. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 32. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 33. 16/40 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins When to take a defensive action?
  • 34. 17/40 The Defender’s Optimal Stopping Problem (1/3) I Infrastructure is a discrete-time dynamical system (st)T t=1 I Defender observes a noisy observation process (ot)T t=1 I Two options at each time t: (C)ontinue and (S)stop I Find the optimal stopping time τ?: τ? ∈ arg max τ Eτ " τ−1 X t=1 γt−1 RC st st+1 + γτ−1 RS sτ sτ # where RS ss0 & RC ss0 are the stop/continue rewards and τ is τ = inf{t : t > 0, at = S} t ot τ∗
  • 35. 18/40 The Defender’s Optimal Stopping Problem (2/3) I Objective: stop the attack as soon as possible I Let the state space be S = {H, C, ∅} H C ∅ Stopped Healthy Compromised
  • 36. 19/40 The Defender’s Optimal Stopping Problem (3/3) I Let the observation process (ot)T t=1 represent ids alerts 0 2000 4000 6000 8000 probability cve-2010-0426 0 2000 4000 6000 8000 cve-2015-3306 0 2000 4000 6000 8000 cve-2015-5602 0 2000 4000 6000 8000 cve-2016-10033 0 2000 4000 6000 8000 cwe-89 0 2000 4000 6000 8000 O probability cve-2017-7494 0 2000 4000 6000 8000 O cve-2014-6271 0 5000 10000 15000 20000 O ftp brute force 0 5000 10000 15000 20000 O ssh brute force 0 5000 10000 15000 20000 O telnet brute force intrusion no intrusion intrusion model normal operation model I Estimate the observation distribution based on M samples from the twin I E.g., compute empirical distribution b Z as estimate of Z I b Z →a.s Z as M → ∞ (Glivenko-Cantelli theorem)
  • 37. 20/40 Optimal Stopping Strategy I The defender can compute the belief bt , P[St = C | b1, o1, o2, . . . ot] I Stopping strategy: π(b) : [0, 1] → {S, C} Defender Belief
  • 38. 20/40 Optimal Threshold Strategy Theorem There exists an optimal defender strategy of the form: π? (b) = S ⇐⇒ b ≥ α? α? ∈ [0, 1] i.e., the stopping set is S = [α?, 1] b 0 1 belief space B = [0, 1] S1 α? 1
  • 39. 21/40 Optimal Multiple Stopping I Suppose the defender can take L ≥ 1 response actions I Find the optimal stopping times τ? L , τ? L−1, . . . , τ? 1 : (τ? l )l=1,...,L ∈ arg max τ1,...,τL Eτ1,...,τL " τL−1 X t=1 γt−1 RC st st+1 + γτL−1 RS sτL sτL + τL−1−1 X t=τL+1 γt−1 RC st st+1 + γτL−1−1 RS sτL−1 sτL−2 + . . . + τ1−1 X t=τ2+1 γt−1 RC st st+1 + γτ1−1 RS sτ1 sτ1 # where τl denotes the stopping time with l stops remaining. t ot τ? L τ? L−1 τ? L−2 . . .
  • 40. 22/40 Optimal Multi-Threshold Strategy Theorem I Stopping sets are nested Sl−1 ⊆ Sl for l = 2, . . . L. I If (ot)t≥1 is totally positive of order 2 (TP2), there exists an optimal defender strategy of the form: π? l (b) = S ⇐⇒ b ≥ α? l , l = 1, . . . , L where α? l ∈ [0, 1] is decreasing in l. b 0 1 belief space B = [0, 1] S1 S2 . . . SL α? 1 α? 2 α? L . . .
  • 41. 23/40 Optimal Stopping Game I Suppose the attacker is dynamic and decides when to start and abort its intrusion. Attacker Defender t = 1 t = T τ1,1 τ1,2 τ1,3 τ2,1 t Stopped Game episode Intrusion I Find the optimal stopping times maximize τD,1,...,τD,L minimize τA,1,τA,2 E[J] where J is the defender’s objective.
  • 42. 24/40 Best-Response Multi-Threshold Strategies (1/2) Theorem I The defender’s best response is of the form: π̃D,l (b) = S ⇐⇒ b ≥ α̃l , l = 1, . . . , L I The attacker’s best response is of the form: π̃A,l (b) = C ⇐⇒ π̃D,l (S | b) ≥ β̃H,l , l = 1, . . . , L, s = H π̃A,l (b) = S ⇐⇒ π̃D,l (S | b) ≥ β̃C,l , l = 1, . . . , L, s = C
  • 43. 25/40 Best-Response Multi-Threshold Strategies (2/2) b Defender 0 1 S (D) 1,π2,l S (D) 2,π2,l . . . S (D) L,π2,l α̃1 α̃2 α̃L . . . b Attacker 0 1 S (A) C,1,πD,l S (A) C,L,πD,l β̃C,1 β̃C,L β̃H,1 . . . β̃H,L . . . S (A) H,1,πD,l S (A) H,L,πD,l
  • 44. 26/40 Efficient Computation of Best Responses Algorithm 1: Threshold Optimization 1 Input: Objective function J, number of thresholds L, parametric optimizer PO 2 Output: A approximate best response strategy π̂θ 3 Algorithm 4 Θ ← [0, 1]L 5 For each θ ∈ Θ, define πθ(bt) as 6 πθ(bt) , ( S if bt ≥ θi C otherwise 7 Jθ ← Eπθ [J] 8 π̂θ ← PO(Θ, Jθ) 9 return π̂θ I Examples of parameteric optimization algorithmns: CEM, BO, CMA-ES, DE, SPSA, etc.
  • 45. 27/40 Threshold-Fictitious Play to Approximate an Equilibrium π̃A ∈ BA(πD) πA πD π̃D ∈ BD(πA) π̃0 A ∈ BA(π0 D) π0 A π0 D π̃0 D ∈ BD(π0 A) . . . π? A ∈ BA(π? D) π? D ∈ BD(π? A) Fictitious play: iterative averaging of best responses. I Learn best response strategies iteratively I Average best responses to approximate the equilibrium
  • 46. 28/40 Comparison against State-of-the-art Algorithms 0 10 20 30 40 50 60 training time (min) 0 50 100 Reward per episode against Novice 0 10 20 30 40 50 60 training time (min) Reward per episode against Experienced 0 10 20 30 40 50 60 training time (min) Reward per episode against Expert PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
  • 47. 29/40 Learning Curves in Simulation and Digital Twin 0 50 100 Novice Reward per episode 0 50 100 Episode length (steps) 0.0 0.5 1.0 P[intrusion interrupted] 0.0 0.5 1.0 1.5 P[early stopping] 5 10 Duration of intrusion −50 0 50 100 experienced 0 50 100 150 0.0 0.5 1.0 0.0 0.5 1.0 1.5 0 5 10 0 20 40 60 training time (min) −50 0 50 100 expert 0 20 40 60 training time (min) 0 50 100 150 0 20 40 60 training time (min) 0.0 0.5 1.0 0 20 40 60 training time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 training time (min) 0 5 10 15 20 πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound 0 20 40 60 80 100 # training iterations 0 1 2 3 Exploitability 0 20 40 60 80 100 # training iterations −5.0 −2.5 0.0 2.5 5.0 Defender reward per episode 0 20 40 60 80 100 # training iterations 0 1 2 3 Intrusion length (π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
  • 48. 29/40 Learning Curves in Simulation and Digital Twin 0 50 100 Novice Reward per episode 0 50 100 Episode length (steps) 0.0 0.5 1.0 P[intrusion interrupted] 0.0 0.5 1.0 1.5 P[early stopping] 5 10 Duration of intrusion −50 0 50 100 experienced 0 50 100 150 0.0 0.5 1.0 0.0 0.5 1.0 1.5 0 5 10 0 20 40 60 training time (min) −50 0 50 100 expert 0 20 40 60 training time (min) 0 50 100 150 0 20 40 60 training time (min) 0.0 0.5 1.0 0 20 40 60 training time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 training time (min) 0 5 10 15 20 πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound 0 20 40 60 80 100 # training iterations 0 1 2 3 Exploitability 0 20 40 60 80 100 # training iterations −5.0 −2.5 0.0 2.5 5.0 Defender reward per episode 0 20 40 60 80 100 # training iterations 0 1 2 3 Intrusion length (π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound Stopping is about timing; now we consider timing + action selection
  • 49. 30/40 General Intrusion Response Game I Suppose the defender and the attacker can take L actions per node I G = h{gw} ∪ V, Ei: directed tree representing the virtual infrastructure I V: set of virtual nodes I E: set of node dependencies I Z: set of zones r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 50. 30/40 General Intrusion Response Game I Suppose the defender and the attacker can take L actions per node I G = h{gw} ∪ V, Ei: directed tree representing the virtual infrastructure I V: set of virtual nodes I E: set of node dependencies I Z: set of zones r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 51. 31/40 State Space I Each i ∈ V has a state vi,t = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 52. 31/40 State Space I Each i ∈ V has a state vi,t = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 53. 31/40 State Space I Each i ∈ V has a state vi,t = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 54. 32/40 Observations I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 55. 32/40 Observations I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 57. 34/40 The (General) Intrusion Response Problem maximizeπD∈ΠD minimizeπA∈ΠA E(πD,πA) [J] E(πD,πA) denotes the expectation of the random vectors (St, Ot, At)t∈{1,...,T} when following the strategy profile (πD, πA)
  • 58. 35/40 The Curse of Dimensionality I Solving the game is computationally intractable. The state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
  • 59. 35/40 The Curse of Dimensionality I While (1) has a solution (i.e the game Γ has a value (Thm 1)), computing it is intractable since the state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V| We tackle the scability challenge with decomposition
  • 60. 36/40 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... Does not directly depend on the state or action of a node down here
  • 61. 36/40 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... But they are not completely independent either. How can we exploit this structure? Does not directly depend on the state or action of a node down here
  • 62. 37/40 Scalable Learning through Decomposition 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 linear measured # parallel processes n |V| = 10 Speedup S n Speedup of best response computation for the decomposed game; Tn denotes the completion time with n processes; the speedup is calculated as Sn = T1 Tn ; the error bars indicate standard deviations from 3 measurements.
  • 63. 38/40 Learning Equilibrium Strategies 0 20 40 60 80 100 running time (h) 0 5 b δ = 0.4 Approximate exploitability b δ 0 20 40 60 80 100 running time (h) 0.0 0.5 1.0 Defender utility per episode dfsp simulation dfsp digital twin upper bound oi,t 0 random defense Learning curves obtained during training of dfsp to find optimal (equilibrium) strategies in the intrusion response game; red and blue curves relate to dfsp; black, orange and green curves relate to baselines.
  • 64. 39/40 Comparison with NFSP 0 10 20 30 40 50 60 70 80 running time (h) 0.0 2.5 5.0 7.5 Approximate exploitability dfsp nfsp Learning curves obtained during training of dfsp and nfsp to find optimal (equilibrium) strategies in the intrusion response game; the red curve relate to dfsp and the purple curve relate to nfsp; all curves show simulation results.
  • 65. 40/40 Conclusions I We develop a framework to automatically learn security strategies. I We apply the framework to an intrusion response use case. I We derive properties of optimal security strategies. I We evaluate strategies on a digital twin. I Questions → demonstration s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation Target System Model Creation System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation Learning