SlideShare a Scribd company logo
1 of 86
Download to read offline
0/52
Learning Automated Intrusion Response
Ericsson Research
Kim Hammar & Rolf Stadler
kimham@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
December 8, 2023
1/52
Use Case: Intrusion Response
I A defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
2/52
Automated Intrusion Response
Levels of security automation
No automation.
Manual detection.
Manual prevention.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Audit logs
Manual detection.
Manual prevention.
Partial automation.
Manual configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
2/52
Automated Intrusion Response
Levels of security automation
No automation.
Manual detection.
Manual prevention.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Audit logs
Manual detection.
Manual prevention.
Partial automation.
Manual configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
Can we find effective security strategies through decision-theoretic methods?
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Model estimation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Model estimation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/52
Our Framework for Automated Intrusion Response
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/52
Creating a Digital Twin of the Target Infrastructure
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
3/52
Learning of Defender Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
4/52
Example Infrastructure Configuration
I 64 nodes
I 24 ovs switches
I 3 gateways
I 6 honeypots
I 8 application servers
I 4 administration servers
I 15 compute servers
I 11 vulnerabilities
I cve-2010-0426
I cve-2015-3306
I etc.
I Management
I 1 sdn controller
I 1 Kafka server
I 1 elastic server
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
5/52
Emulating Physical Components
Containers
Physical server
Operating system
Docker engine
Our framework
I We emulate physical components with Docker containers
I Focus on linux-based systems
I Our framework provides the orchestration layer
6/52
Emulating Network Connectivity
Management node 1
Emulated IT infrastructure
Management node 2
Emulated IT infrastructure
Management node n
Emulated IT infrastructure
VXLAN VXLAN . . . VXLAN
IP network
I We emulate network connectivity on the same host using
network namespaces
I Connectivity across physical hosts is achieved using VXLAN
tunnels with Docker swarm
7/52
Emulating Network Conditions
I Traffic shaping using NetEm
I Allows to configure:
I Delay
I Capacity
I Packet Loss
I Jitter
I Queueing delays
I etc.
User space
. . .
Application processes
Kernel
TCP/UDP
IP/Ethernet/802.11
OS
TCP/IP
stack
Queueing
discipline
Device driver
queue (FIFO)
NIC
Netem config:
latency,
jitter, etc.
Sockets
8/52
Emulating Clients Client population
. . .
Arrival rate λ Departure
Service time µ
. . .
.
.
.
.
.
.
.
.
.
w1 w2 w|W|
Workflows (Markov processes)
I Homogeneous client population
I Clients arrive according to Po(λ)
I Client service times Exp(µ)
I Service dependencies (St)t=1,2,... ∼ mc
9/52
Emulating The Attacker and The Defender
I API for automated
defender and attacker
actions
I Attacker actions:
I Exploits
I Reconnaissance
I Pivoting
I etc.
I Defender actions:
I Shut downs
I Redirect
I Isolate
I Recover
I Migrate
I etc.
Markov Decision Process
s1,1 s1,2 s1,3 . . . s1,4
s2,1 s2,2 s2,3 . . . s2,4
Digital Twin
. . .
Virtual
network
Virtual
devices
Emulated
services
Emulated
actors
IT Infrastructure
Configuration
& change events
System traces
Verified security policy
Optimized security policy
10/52
Software framework
Metastore
Python libraries
Management api
rest api cli
grpc
Management api
Digital twins
I More details about the software framework
I Source code: https://github.com/Limmen/csle
I Documentation: http://limmen.dev/csle/
I Demo: https://www.youtube.com/watch?v=iE2KPmtIs2A
11/52
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
12/52
System Model
H C
∅
Crashed
Healthy Compromised
Model
complexity
Static attacker
Small set of responses
Dynamic attacker
Small set of responses
Dynamic attacker
Large set of responses
I Intrusion response can be modeled in many ways
I As a parametric optimization problem
I As an optimal stopping problem
I As a dynamic program
I As a game
I etc.
13/52
Related Work on Learning Automated Intrusion Response
External validity
model
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022-2023
Our work 2020-2023
14/52
Intrusion Response through Optimal Stopping
I Suppose
I The attacker follows a fixed strategy (no adaptation)
I We only have one response action, e.g., block the gateway
I Formulate intrusion response as optimal stopping
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
affect the intrusion
Episode
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
15/52
Intrusion Response from the Defender’s Perspective
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
When to take a defensive action?
16/52
The Defender’s Optimal Stopping Problem (1/3)
I Infrastructure is a discrete-time dynamical system (st)T
t=1
I Defender observes a noisy observation process (ot)T
t=1
I Two options at each time t: (C)ontinue and (S)stop
I Find the optimal stopping time τ?:
τ?
∈ arg max
τ
Eτ
" τ−1
X
t=1
γt−1
RC
st st+1
+ γτ−1
RS
sτ sτ
#
where RS
ss0 & RC
ss0 are the stop/continue rewards and τ is
τ = inf{t : t > 0, at = S}
t
ot
τ∗
17/52
The Defender’s Optimal Stopping Problem (2/3)
I Objective: stop the attack as soon as possible
I Let the state space be S = {H, C, ∅}
H C
∅
Stopped
Healthy Compromised
18/52
The Defender’s Optimal Stopping Problem (3/3)
I Let the observation process (ot)T
t=1 represent ids alerts
0 2000 4000 6000 8000
probability
cve-2010-0426
0 2000 4000 6000 8000
cve-2015-3306
0 2000 4000 6000 8000
cve-2015-5602
0 2000 4000 6000 8000
cve-2016-10033
0 2000 4000 6000 8000
cwe-89
0 2000 4000 6000 8000
O
probability
cve-2017-7494
0 2000 4000 6000 8000
O
cve-2014-6271
0 5000 10000 15000 20000
O
ftp brute force
0 5000 10000 15000 20000
O
ssh brute force
0 5000 10000 15000 20000
O
telnet brute force
intrusion no intrusion intrusion model normal operation model
I Estimate the observation distribution based on M samples
from the twin
I E.g., compute empirical distribution b
Z as estimate of Z
I b
Z →a.s Z as M → ∞ (Glivenko-Cantelli theorem)
19/52
Optimal Stopping Strategy
I The defender can compute the belief
bt , P[Si,t = C | b1, o1, o2, . . . ot]
I Stopping strategy: π(b) : [0, 1] → {S, C}
19/52
Optimal Threshold Strategy
Theorem
There exists an optimal defender strategy of the form:
π?
(b) = S ⇐⇒ b ≥ α?
α?
∈ [0, 1]
i.e., the stopping set is S = [α?, 1]
b
0 1
belief space B = [0, 1]
S1
α?
1
20/52
Optimal Multiple Stopping
I Suppose the defender can take L ≥ 1 response actions
I Find the optimal stopping times τ?
L , τ?
L−1, . . . , τ?
1 :
(τ?
l )l=1,...,L ∈ arg max
τ1,...,τL
Eτ1,...,τL
" τL−1
X
t=1
γt−1
RC
st st+1
+ γτL−1
RS
sτL
sτL
+
τL−1−1
X
t=τL+1
γt−1
RC
st st+1
+ γτL−1−1
RS
sτL−1
sτL−2
+ . . . +
τ1−1
X
t=τ2+1
γt−1
RC
st st+1
+ γτ1−1
RS
sτ1 sτ1
#
where τl denotes the stopping time with l stops remaining.
t
ot
τ?
L τ?
L−1 τ?
L−2 . . .
21/52
Optimal Multi-Threshold Strategy
Theorem
I Stopping sets are nested Sl−1 ⊆ Sl for l = 2, . . . L.
I If (ot)t≥1 is totally positive of order 2 (TP2), there exists an
optimal defender strategy of the form:
π?
l (b) = S ⇐⇒ b ≥ α?
l , l = 1, . . . , L
where α?
l ∈ [0, 1] is decreasing in l.
b
0 1
belief space B = [0, 1]
S1
S2
.
.
.
SL
α?
1
α?
2
α?
L
. . .
22/52
Optimal Stopping Game
I Suppose the attacker is dynamic and decides when to start
and abort its intrusion.
Attacker
Defender
t = 1
t = T
τ1,1 τ1,2 τ1,3
τ2,1
t
Stopped
Game episode
Intrusion
I Find the optimal stopping times
maximize
τD,1,...,τD,L
minimize
τA,1,τA,2
E[J]
where J is the defender’s objective.
23/52
Best-Response Multi-Threshold Strategies (1/2)
Theorem
I The defender’s best response is of the form:
π̃D,l (b) = S ⇐⇒ b ≥ α̃l , l = 1, . . . , L
I The attacker’s best response is of the form:
π̃A,l (b) = C ⇐⇒ π̃D,l (S | b) ≥ β̃H,l , l = 1, . . . , L, s = H
π̃A,l (b) = S ⇐⇒ π̃D,l (S | b) ≥ β̃C,l , l = 1, . . . , L, s = C
24/52
Best-Response Multi-Threshold Strategies (2/2)
b
Defender
0 1
S
(D)
1,π2,l
S
(D)
2,π2,l
.
.
.
S
(D)
L,π2,l
α̃1
α̃2
α̃L . . .
b
Attacker
0 1
S
(A)
C,1,πD,l
S
(A)
C,L,πD,l
β̃C,1
β̃C,L
β̃H,1
. . . β̃H,L
. . .
S
(A)
H,1,πD,l
S
(A)
H,L,πD,l
25/52
Efficient Computation of Best Responses
Algorithm 1: Threshold Optimization
1 Input: Objective function J, number of thresholds L,
parametric optimizer PO
2 Output: A approximate best response strategy π̂θ
3 Algorithm
4 Θ ← [0, 1]L
5 For each θ ∈ Θ, define πθ(bt) as
6 πθ(bt) ,
(
S if bt ≥ θi
C otherwise
7 Jθ ← Eπθ
[J]
8 π̂θ ← PO(Θ, Jθ)
9 return π̂θ
I Examples of parameteric optimization algorithmns: CEM, BO,
CMA-ES, DE, SPSA, etc.
26/52
Threshold-Fictitious Play to Approximate an Equilibrium
π̃A ∈ BA(πD)
πA
πD
π̃D ∈ BD(πA)
π̃0
A ∈ BA(π0
D)
π0
A
π0
D
π̃0
D ∈ BD(π0
A)
. . .
π?
A ∈ BA(π?
D)
π?
D ∈ BD(π?
A)
Fictitious play: iterative averaging of best responses.
I Learn best response strategies iteratively
I Average best responses to approximate the equilibrium
27/52
Comparison against State-of-the-art Algorithms
0 10 20 30 40 50 60
training time (min)
0
50
100
Reward per episode against Novice
0 10 20 30 40 50 60
training time (min)
Reward per episode against Experienced
0 10 20 30 40 50 60
training time (min)
Reward per episode against Expert
PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
28/52
Learning Curves in Simulation and Digital Twin
0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0.0
0.5
1.0
P[intrusion interrupted]
0.0
0.5
1.0
1.5
P[early stopping]
5
10
Duration of intrusion
−50
0
50
100
experienced
0
50
100
150
0.0
0.5
1.0
0.0
0.5
1.0
1.5
0
5
10
0 20 40 60
training time (min)
−50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0.0
0.5
1.0
0 20 40 60
training time (min)
0.0
0.5
1.0
1.5
2.0
0 20 40 60
training time (min)
0
5
10
15
20
πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
−5.0
−2.5
0.0
2.5
5.0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
28/52
Learning Curves in Simulation and Digital Twin
0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0.0
0.5
1.0
P[intrusion interrupted]
0.0
0.5
1.0
1.5
P[early stopping]
5
10
Duration of intrusion
−50
0
50
100
experienced
0
50
100
150
0.0
0.5
1.0
0.0
0.5
1.0
1.5
0
5
10
0 20 40 60
training time (min)
−50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0.0
0.5
1.0
0 20 40 60
training time (min)
0.0
0.5
1.0
1.5
2.0
0 20 40 60
training time (min)
0
5
10
15
20
πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
−5.0
−2.5
0.0
2.5
5.0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
Stopping is about timing; now we consider timing + action selection
29/52
General Intrusion Response Game
I Suppose the defender and the attacker
can take L actions per node
I G = h{gw} ∪ V, Ei: directed tree
representing the virtual infrastructure
I V: set of virtual nodes
I E: set of node dependencies
I Z: set of zones
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
29/52
General Intrusion Response Game
I Suppose the defender and the attacker
can take L actions per node
I G = h{gw} ∪ V, Ei: directed tree
representing the virtual infrastructure
I V: set of virtual nodes
I E: set of node dependencies
I Z: set of zones
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
30/52
State Space
I Each i ∈ V has a state
vi,t = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
30/52
State Space
I Each i ∈ V has a state
vi,t = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
30/52
State Space
I Each i ∈ V has a state
vi,t = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
31/52
Workflows
I Services are connected into workflows W = {w1, . . . , w|W|}.
31/52
Workflows
I Services are connected into workflows W = {w1, . . . , w|W|}.
gw fw idps lb
http
servers
auth
server
search
engine
db
cache
32/52
Workflows
I Services are connected into
workflows
W = {w1, . . . , w|W|}.
I Each w ∈ W is realized as a
subtree Gw = h{gw} ∪ Vw, Ewi
of G
I W = {w1, . . . , w|W|} induces a
partitioning
V =
[
wi ∈W
Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅
Zone a
Zone b Zone c
gw
1 2 3
4 5 6
7
A workflow tree
32/52
Workflows
I Services are connected into
workflows
W = {w1, . . . , w|W|}.
I Each w ∈ W is realized as a
subtree Gw = h{gw} ∪ Vw, Ewi
of G
I W = {w1, . . . , w|W|} induces a
partitioning
V =
[
wi ∈W
Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅
Zone a
Zone b Zone c
gw
1 2 3
4 5 6
7
A workflow tree
33/52
Observations
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
33/52
Observations
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
34/52
probability
ZO1
ZO2
ZO3
ZO4
ZO5
ZO6
ZO7
ZO8
probability
ZO9
ZO10
ZO11
ZO12
ZO13
ZO14
ZO15
ZO16
probability
ZO17
ZO18
ZO19
ZO20
ZO21
ZO22
ZO23
ZO24
probability
ZO25
ZO26
ZO27
ZO28
ZO29
ZO30
ZO31
ZO32
probability
ZO33
ZO34
ZO35
ZO36
ZO37
ZO38
ZO39
ZO40
probability
ZO41
ZO42
ZO43
ZO44
ZO45
ZO46
ZO47
ZO48
probability
ZO49
ZO50
ZO51
ZO52
ZO53
ZO54
ZO55
ZO56
250 500 750
O
probability
ZO57
250 500 750
O
ZO58
250 500 750
O
ZO59
250 500 750
O
ZO60
250 500 750
O
ZO61
250 500 750
O
ZO62
250 500 750
O
ZO63
250 500 750
O
ZO64
Distributions of # alerts weighted by priority ZOi
(Oi | S
(D)
i , A
(A)
i ) per node i ∈ V
no intrusion intrusion
35/52
Defender
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App node
Defender
Revoke
certificates
Blacklist
IP
1) Node migration 2) Flow migration and blocking
3) Shut down node 4) Access control
35/52
Defender
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App node
Defender
Revoke
certificates
Blacklist
IP
1) Node migration 2) Flow migration and blocking
3) Shut down node 4) Access control
35/52
Defender
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App node
Defender
Revoke
certificates
Blacklist
IP
1) Node migration 2) Flow migration and blocking
3) Shut down node 4) Access control
36/52
Attacker
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
36/52
Attacker
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
36/52
Attacker
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
37/52
The Intrusion Response Problem
maximize
πD∈ΠD
minimize
πA∈ΠA
E(πD,πA) [J]
subject to s
(D)
t+1 ∼ fD · | A
(D)
t , A
(D)
t

∀t
s
(A)
t+1 ∼ fA · | S
(A)
t , At

∀t
ot+1 ∼ Z · | S
(D)
t+1, A
(A)
t ) ∀t
a
(A)
t ∼ πA · | H
(A)
t

, a
(A)
t ∈ AA(st) ∀t
a
(D)
t ∼ πD · | H
(D)
t

, a
(D)
t ∈ AD ∀t
E(πD,πA) denotes the expectation of the random vectors
(St, Ot, At)t∈{1,...,T} when following the strategy profile (πD, πA)
(1) can be formulated as a zero-sum Partially Observed Stochastic
Game with Public Observations (a PO-POSG):
Γ = hN, (Si )i∈N , (Ai )i∈N , (fi )i∈N , u, γ, (b
(i)
1 )i∈N , O, Zi
38/52
Existence of a Solution
Theorem
Given the po-posg Γ, the following holds:
(A) Γ has a mixed Nash equilibrium and a value function
V ? : BD × BA → R.
(B) For each strategy pair (πA, πD) ∈ ΠA × ΠD, the best
response sets BD(πA) and BA(πD) are non-empty.
39/52
The Curse of Dimensionality
I While Γ has a value, computing it is intractable. The state,
action, and observation spaces of the game grow
exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
39/52
The Curse of Dimensionality
I While (1) has a solution (i.e the game Γ has a value (Thm
1)), computing it is intractable since the state, action, and
observation spaces of the game grow exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
We tackle the scability challenge with decomposition
40/52
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
Does not directly
depend on the state or
action of a node
down here
40/52
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
But they are
not completely
independent either.
How can we
exploit this
structure?
Does not directly
depend on the state
or action of a node
down here
41/52
Our Approach: System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level with optimal substructure
3. Threshold properties of local defender strategies.
I Optimal node-level strategies exhibit threshold structures
42/52
Additive Structure Across Workflows (Intuition)
“=”
I If there is no path between i and j in G, then i and j are
independent in the following sense:
I Compromising i has no affect on the state of j.
I Compromising i does not make it harder or easier to
compromise j.
I Compromising i does not affect the service provided by j.
I Defending i does not affect the state of j.
I Defending i does not affect the service provided by j.
43/52
Additive Structure Across Workflows
Definition (Transition independence)
A set of nodes Q are transition independent iff the transition
probabilities factorize as
f (St+1 | St, At) =
Y
i∈Q
f (St+1,i | St,i , At,i )
Definition (Utility independence)
A set of nodes Q are utility independent iff there exists functions
u1, . . . , u|Q| such that the utility function u decomposes as
u(St, At) = f (u1(St,1, At,1), . . . , u1(St,|Q|, At,Q))
and
ui ≤ u0
i ⇐⇒ f (u1, . . . , ui , . . . , u|Q|) ≤ f (u1, . . . , u0
i , . . . , u|Q|)
44/52
Additive Structure Across Workflows
Theorem (Node independencies)
(A) All nodes V in the game Γ are transition independent.
(B) If there is no path between i and j in the topology graph G,
then i and j are utility independent.
Corollary (Additive structure across workflows)
Γ decomposes into |W| additive subproblems that can be solved
independently and in parallel.
π
(w1)
k
π
(w2)
k
π
(w|W|)
k
ot,w1
ot,w2
ot,w|W|
.
.
.
⊕
a
(k)
w1
a
(k)
w2
a
(k)
w|W|
a
(k)
t
45/52
Optimal Substructure Within a Workflow
I Nodes in the same workflow are utility
dependent.
I =⇒ Adding locally-optimal strategies
does not yield an optimal workflow
strategy.
I However, the locally-optimal strategies
satisfy the optimal substructure
property.
I =⇒ there exists an algorithm for
constructing an optimal workflow
strategy from locally-optimal
strategies for each node.
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
IT infrastructure
Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
45/52
Optimal Substructure Within a Workflow
I Nodes in the same workflow are utility
dependent.
I =⇒ Adding locally-optimal strategies
does not yield an optimal workflow
strategy.
I However, the locally-optimal strategies
satisfy the optimal substructure
property.
I =⇒ there exists an algorithm for
constructing an optimal workflow
strategy from locally-optimal
strategies for each node.
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
IT infrastructure
Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
46/52
Scalable Learning through Decomposition
1 2 3 4 5 6 7 8 9 10
2
4
6
8
10
linear
measured
# parallel processes n
|V| = 10
Speedup
S
n
Speedup of best response computation for the decomposed game; Tn
denotes the completion time with n processes; the speedup is calculated
as Sn = T1
Tn
; the error bars indicate standard deviations from 3
measurements.
47/52
Threshold Properties of Local Defender Strategies.
Belief space B
(j)
D
Switching curve
Υ
Continuation set
C
Stopping set
S
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
I A node can be in three attack states s
(A)
t : Healthy,
Discovered, Compromised.
I The defender has a belief state b
(D)
t
48/52
Proof Sketch (Threshold Properties)
I Let L(e1, b̂) denote the line segment
that starts at the belief state
e1 = (1, 0, 0) and ends at b̂, where b̂ is
in the sub-simplex that joins e2 and e3.
I All beliefs on L(e1, b̂) are totally
ordered according to the Monotone
Likelihood Ratio (MLR) order. =⇒ a
threshold belief state αb̂ ∈ L(e1, b̂)
exists where the optimal strategy
switches from C to S.
I Since the entire belief space can be
covered by the union of lines L(e1, b̂),
the threshold belief states αb̂1
, αb̂2
, . . .
yield a switching curve Υ.
Belief space B
(j)
D
(the 2-dimensional unit simplex)
sub-simplex B
(j)
D,e1
joining e2 and e3
b̂5
b̂4
b̂3
b̂2
b̂1
b̂6
b̂7
b̂8
b̂9
L(e1, b̂5)
Switching curve
Υ
Threshold
belief state αb̂9
e1
(1, 0, 0)
e2
(0, 1, 0)
e3
(0, 0, 1)
49/52
Decompositional Fictitious Play (DFSP)
π̃A ∈ BA(πD)
πA
πD
π̃D ∈ BD(πA)
π̃0
A ∈ BA(π0
D)
π0
A
π0
D
π̃0
D ∈ BD(π0
A)
. . .
π?
A ∈ BA(π?
D)
π?
D ∈ BD(π?
A)
Fictitious play: iterative averaging of best responses.
I Learn best response strategies iteratively through the
parallel solving of subgames in the decomposition
I Average best responses to approximate the equilibrium
50/52
Learning Equilibrium Strategies
0 20 40 60 80 100
running time (h)
0
5
b
δ = 0.4
Approximate exploitability b
δ
0 20 40 60 80 100
running time (h)
0.0
0.5
1.0
Defender utility per episode
dfsp simulation dfsp digital twin upper bound oi,t  0 random defense
Learning curves obtained during training of dfsp to find optimal
(equilibrium) strategies in the intrusion response game; red and blue
curves relate to dfsp; black, orange and green curves relate to baselines.
51/52
Comparison with NFSP
0 10 20 30 40 50 60 70 80
running time (h)
0.0
2.5
5.0
7.5
Approximate exploitability
dfsp nfsp
Learning curves obtained during training of dfsp and nfsp to find
optimal (equilibrium) strategies in the intrusion response game; the red
curve relate to dfsp and the purple curve relate to nfsp; all curves
show simulation results.
52/52
Conclusions
I We develop a framework to
automatically learn security strategies.
I We apply the framework to an
intrusion response use case.
I We derive properties of optimal
security strategies.
I We evaluate strategies on a digital
twin.
I Questions → demonstration
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation 
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation 
Learning

More Related Content

Similar to Learning Automated Intrusion Response

Self-learning systems for cyber security
Self-learning systems for cyber securitySelf-learning systems for cyber security
Self-learning systems for cyber securityKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
endpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdfendpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdfOlufemi37
 
Automate threat detections and avoid false positives
  Automate threat detections and avoid false positives  Automate threat detections and avoid false positives
Automate threat detections and avoid false positivesElasticsearch
 
Security operation center (SOC)
Security operation center (SOC)Security operation center (SOC)
Security operation center (SOC)Ahmed Ayman
 
AktaionPPTv5_JZedits
AktaionPPTv5_JZeditsAktaionPPTv5_JZedits
AktaionPPTv5_JZeditsRod Soto
 
SEC599 - Breaking The Kill Chain
SEC599 - Breaking The Kill ChainSEC599 - Breaking The Kill Chain
SEC599 - Breaking The Kill ChainErik Van Buggenhout
 
McAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEMMcAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEMIftikhar Ali Iqbal
 
how-to-bypass-AM-PPL
how-to-bypass-AM-PPLhow-to-bypass-AM-PPL
how-to-bypass-AM-PPLnitinscribd
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesICS
 
Virtual Machines Security Internals: Detection and Exploitation
 Virtual Machines Security Internals: Detection and Exploitation Virtual Machines Security Internals: Detection and Exploitation
Virtual Machines Security Internals: Detection and ExploitationMattia Salvi
 
AI for Cybersecurity Innovation
AI for Cybersecurity InnovationAI for Cybersecurity Innovation
AI for Cybersecurity InnovationPete Burnap
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...Open Networking Perú (Opennetsoft)
 
Automatisez la détection des menaces et évitez les faux positifs
Automatisez la détection des menaces et évitez les faux positifsAutomatisez la détection des menaces et évitez les faux positifs
Automatisez la détection des menaces et évitez les faux positifsElasticsearch
 
Automated Cloud-Native Incident Response with Kubernetes and Service Mesh
Automated Cloud-Native Incident Response with Kubernetes and Service MeshAutomated Cloud-Native Incident Response with Kubernetes and Service Mesh
Automated Cloud-Native Incident Response with Kubernetes and Service MeshMatt Turner
 
3.Secure Design Principles And Process
3.Secure Design Principles And Process3.Secure Design Principles And Process
3.Secure Design Principles And Processphanleson
 
Automatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivosAutomatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivosImma Valls Bernaus
 
Automatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivosAutomatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivosImma Valls Bernaus
 

Similar to Learning Automated Intrusion Response (20)

Self-learning systems for cyber security
Self-learning systems for cyber securitySelf-learning systems for cyber security
Self-learning systems for cyber security
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
endpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdfendpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdf
 
Automate threat detections and avoid false positives
  Automate threat detections and avoid false positives  Automate threat detections and avoid false positives
Automate threat detections and avoid false positives
 
Security operation center (SOC)
Security operation center (SOC)Security operation center (SOC)
Security operation center (SOC)
 
AktaionPPTv5_JZedits
AktaionPPTv5_JZeditsAktaionPPTv5_JZedits
AktaionPPTv5_JZedits
 
SEC599 - Breaking The Kill Chain
SEC599 - Breaking The Kill ChainSEC599 - Breaking The Kill Chain
SEC599 - Breaking The Kill Chain
 
McAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEMMcAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEM
 
how-to-bypass-AM-PPL
how-to-bypass-AM-PPLhow-to-bypass-AM-PPL
how-to-bypass-AM-PPL
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge Devices
 
Virtual Machines Security Internals: Detection and Exploitation
 Virtual Machines Security Internals: Detection and Exploitation Virtual Machines Security Internals: Detection and Exploitation
Virtual Machines Security Internals: Detection and Exploitation
 
AI for Cybersecurity Innovation
AI for Cybersecurity InnovationAI for Cybersecurity Innovation
AI for Cybersecurity Innovation
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
 
Security Center.pdf
Security Center.pdfSecurity Center.pdf
Security Center.pdf
 
Automatisez la détection des menaces et évitez les faux positifs
Automatisez la détection des menaces et évitez les faux positifsAutomatisez la détection des menaces et évitez les faux positifs
Automatisez la détection des menaces et évitez les faux positifs
 
Automated Cloud-Native Incident Response with Kubernetes and Service Mesh
Automated Cloud-Native Incident Response with Kubernetes and Service MeshAutomated Cloud-Native Incident Response with Kubernetes and Service Mesh
Automated Cloud-Native Incident Response with Kubernetes and Service Mesh
 
Become a Cloud Security Ninja
Become a Cloud Security NinjaBecome a Cloud Security Ninja
Become a Cloud Security Ninja
 
3.Secure Design Principles And Process
3.Secure Design Principles And Process3.Secure Design Principles And Process
3.Secure Design Principles And Process
 
Automatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivosAutomatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivos
 
Automatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivosAutomatiza las detecciones de amenazas y evita falsos positivos
Automatiza las detecciones de amenazas y evita falsos positivos
 

More from Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...Kim Hammar
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedKim Hammar
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhetKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 
Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecurityKim Hammar
 

More from Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Learning Automated Intrusion Response

  • 1. 0/52 Learning Automated Intrusion Response Ericsson Research Kim Hammar & Rolf Stadler kimham@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology December 8, 2023
  • 2. 1/52 Use Case: Intrusion Response I A defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I Has partial observability I An attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 3. 2/52 Automated Intrusion Response Levels of security automation No automation. Manual detection. Manual prevention. Lack of tools. 1980s 1990s 2000s-Now Research Operator assistance. Audit logs Manual detection. Manual prevention. Partial automation. Manual configuration. Intrusion detection systems. Intrusion prevention systems. High automation. System automatically updates itself.
  • 4. 2/52 Automated Intrusion Response Levels of security automation No automation. Manual detection. Manual prevention. Lack of tools. 1980s 1990s 2000s-Now Research Operator assistance. Audit logs Manual detection. Manual prevention. Partial automation. Manual configuration. Intrusion detection systems. Intrusion prevention systems. High automation. System automatically updates itself. Can we find effective security strategies through decision-theoretic methods?
  • 5. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 6. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 7. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Model estimation Automation & Self-learning systems
  • 8. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Model estimation Automation & Self-learning systems
  • 9. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 10. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation Automation & Self-learning systems
  • 11. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 12. 3/52 Our Framework for Automated Intrusion Response s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 13. 3/52 Creating a Digital Twin of the Target Infrastructure s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 14. 3/52 Learning of Defender Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Optimization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 15. 4/52 Example Infrastructure Configuration I 64 nodes I 24 ovs switches I 3 gateways I 6 honeypots I 8 application servers I 4 administration servers I 15 compute servers I 11 vulnerabilities I cve-2010-0426 I cve-2015-3306 I etc. I Management I 1 sdn controller I 1 Kafka server I 1 elastic server r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 16. 5/52 Emulating Physical Components Containers Physical server Operating system Docker engine Our framework I We emulate physical components with Docker containers I Focus on linux-based systems I Our framework provides the orchestration layer
  • 17. 6/52 Emulating Network Connectivity Management node 1 Emulated IT infrastructure Management node 2 Emulated IT infrastructure Management node n Emulated IT infrastructure VXLAN VXLAN . . . VXLAN IP network I We emulate network connectivity on the same host using network namespaces I Connectivity across physical hosts is achieved using VXLAN tunnels with Docker swarm
  • 18. 7/52 Emulating Network Conditions I Traffic shaping using NetEm I Allows to configure: I Delay I Capacity I Packet Loss I Jitter I Queueing delays I etc. User space . . . Application processes Kernel TCP/UDP IP/Ethernet/802.11 OS TCP/IP stack Queueing discipline Device driver queue (FIFO) NIC Netem config: latency, jitter, etc. Sockets
  • 19. 8/52 Emulating Clients Client population . . . Arrival rate λ Departure Service time µ . . . . . . . . . . . . w1 w2 w|W| Workflows (Markov processes) I Homogeneous client population I Clients arrive according to Po(λ) I Client service times Exp(µ) I Service dependencies (St)t=1,2,... ∼ mc
  • 20. 9/52 Emulating The Attacker and The Defender I API for automated defender and attacker actions I Attacker actions: I Exploits I Reconnaissance I Pivoting I etc. I Defender actions: I Shut downs I Redirect I Isolate I Recover I Migrate I etc. Markov Decision Process s1,1 s1,2 s1,3 . . . s1,4 s2,1 s2,2 s2,3 . . . s2,4 Digital Twin . . . Virtual network Virtual devices Emulated services Emulated actors IT Infrastructure Configuration & change events System traces Verified security policy Optimized security policy
  • 21. 10/52 Software framework Metastore Python libraries Management api rest api cli grpc Management api Digital twins I More details about the software framework I Source code: https://github.com/Limmen/csle I Documentation: http://limmen.dev/csle/ I Demo: https://www.youtube.com/watch?v=iE2KPmtIs2A
  • 22. 11/52 System Identification s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 23. 12/52 System Model H C ∅ Crashed Healthy Compromised Model complexity Static attacker Small set of responses Dynamic attacker Small set of responses Dynamic attacker Large set of responses I Intrusion response can be modeled in many ways I As a parametric optimization problem I As an optimal stopping problem I As a dynamic program I As a game I etc.
  • 24. 13/52 Related Work on Learning Automated Intrusion Response External validity model complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets) Xiao et al. 2021. (RL approach to APT) etc. 2022-2023 Our work 2020-2023
  • 25. 14/52 Intrusion Response through Optimal Stopping I Suppose I The attacker follows a fixed strategy (no adaptation) I We only have one response action, e.g., block the gateway I Formulate intrusion response as optimal stopping Intrusion event time-step t = 1 Intrusion ongoing t t = T Early stopping times Stopping times that affect the intrusion Episode
  • 26. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 27. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 28. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 29. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 30. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 31. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 32. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 33. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 34. 15/52 Intrusion Response from the Defender’s Perspective 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins When to take a defensive action?
  • 35. 16/52 The Defender’s Optimal Stopping Problem (1/3) I Infrastructure is a discrete-time dynamical system (st)T t=1 I Defender observes a noisy observation process (ot)T t=1 I Two options at each time t: (C)ontinue and (S)stop I Find the optimal stopping time τ?: τ? ∈ arg max τ Eτ " τ−1 X t=1 γt−1 RC st st+1 + γτ−1 RS sτ sτ # where RS ss0 & RC ss0 are the stop/continue rewards and τ is τ = inf{t : t > 0, at = S} t ot τ∗
  • 36. 17/52 The Defender’s Optimal Stopping Problem (2/3) I Objective: stop the attack as soon as possible I Let the state space be S = {H, C, ∅} H C ∅ Stopped Healthy Compromised
  • 37. 18/52 The Defender’s Optimal Stopping Problem (3/3) I Let the observation process (ot)T t=1 represent ids alerts 0 2000 4000 6000 8000 probability cve-2010-0426 0 2000 4000 6000 8000 cve-2015-3306 0 2000 4000 6000 8000 cve-2015-5602 0 2000 4000 6000 8000 cve-2016-10033 0 2000 4000 6000 8000 cwe-89 0 2000 4000 6000 8000 O probability cve-2017-7494 0 2000 4000 6000 8000 O cve-2014-6271 0 5000 10000 15000 20000 O ftp brute force 0 5000 10000 15000 20000 O ssh brute force 0 5000 10000 15000 20000 O telnet brute force intrusion no intrusion intrusion model normal operation model I Estimate the observation distribution based on M samples from the twin I E.g., compute empirical distribution b Z as estimate of Z I b Z →a.s Z as M → ∞ (Glivenko-Cantelli theorem)
  • 38. 19/52 Optimal Stopping Strategy I The defender can compute the belief bt , P[Si,t = C | b1, o1, o2, . . . ot] I Stopping strategy: π(b) : [0, 1] → {S, C}
  • 39. 19/52 Optimal Threshold Strategy Theorem There exists an optimal defender strategy of the form: π? (b) = S ⇐⇒ b ≥ α? α? ∈ [0, 1] i.e., the stopping set is S = [α?, 1] b 0 1 belief space B = [0, 1] S1 α? 1
  • 40. 20/52 Optimal Multiple Stopping I Suppose the defender can take L ≥ 1 response actions I Find the optimal stopping times τ? L , τ? L−1, . . . , τ? 1 : (τ? l )l=1,...,L ∈ arg max τ1,...,τL Eτ1,...,τL " τL−1 X t=1 γt−1 RC st st+1 + γτL−1 RS sτL sτL + τL−1−1 X t=τL+1 γt−1 RC st st+1 + γτL−1−1 RS sτL−1 sτL−2 + . . . + τ1−1 X t=τ2+1 γt−1 RC st st+1 + γτ1−1 RS sτ1 sτ1 # where τl denotes the stopping time with l stops remaining. t ot τ? L τ? L−1 τ? L−2 . . .
  • 41. 21/52 Optimal Multi-Threshold Strategy Theorem I Stopping sets are nested Sl−1 ⊆ Sl for l = 2, . . . L. I If (ot)t≥1 is totally positive of order 2 (TP2), there exists an optimal defender strategy of the form: π? l (b) = S ⇐⇒ b ≥ α? l , l = 1, . . . , L where α? l ∈ [0, 1] is decreasing in l. b 0 1 belief space B = [0, 1] S1 S2 . . . SL α? 1 α? 2 α? L . . .
  • 42. 22/52 Optimal Stopping Game I Suppose the attacker is dynamic and decides when to start and abort its intrusion. Attacker Defender t = 1 t = T τ1,1 τ1,2 τ1,3 τ2,1 t Stopped Game episode Intrusion I Find the optimal stopping times maximize τD,1,...,τD,L minimize τA,1,τA,2 E[J] where J is the defender’s objective.
  • 43. 23/52 Best-Response Multi-Threshold Strategies (1/2) Theorem I The defender’s best response is of the form: π̃D,l (b) = S ⇐⇒ b ≥ α̃l , l = 1, . . . , L I The attacker’s best response is of the form: π̃A,l (b) = C ⇐⇒ π̃D,l (S | b) ≥ β̃H,l , l = 1, . . . , L, s = H π̃A,l (b) = S ⇐⇒ π̃D,l (S | b) ≥ β̃C,l , l = 1, . . . , L, s = C
  • 44. 24/52 Best-Response Multi-Threshold Strategies (2/2) b Defender 0 1 S (D) 1,π2,l S (D) 2,π2,l . . . S (D) L,π2,l α̃1 α̃2 α̃L . . . b Attacker 0 1 S (A) C,1,πD,l S (A) C,L,πD,l β̃C,1 β̃C,L β̃H,1 . . . β̃H,L . . . S (A) H,1,πD,l S (A) H,L,πD,l
  • 45. 25/52 Efficient Computation of Best Responses Algorithm 1: Threshold Optimization 1 Input: Objective function J, number of thresholds L, parametric optimizer PO 2 Output: A approximate best response strategy π̂θ 3 Algorithm 4 Θ ← [0, 1]L 5 For each θ ∈ Θ, define πθ(bt) as 6 πθ(bt) , ( S if bt ≥ θi C otherwise 7 Jθ ← Eπθ [J] 8 π̂θ ← PO(Θ, Jθ) 9 return π̂θ I Examples of parameteric optimization algorithmns: CEM, BO, CMA-ES, DE, SPSA, etc.
  • 46. 26/52 Threshold-Fictitious Play to Approximate an Equilibrium π̃A ∈ BA(πD) πA πD π̃D ∈ BD(πA) π̃0 A ∈ BA(π0 D) π0 A π0 D π̃0 D ∈ BD(π0 A) . . . π? A ∈ BA(π? D) π? D ∈ BD(π? A) Fictitious play: iterative averaging of best responses. I Learn best response strategies iteratively I Average best responses to approximate the equilibrium
  • 47. 27/52 Comparison against State-of-the-art Algorithms 0 10 20 30 40 50 60 training time (min) 0 50 100 Reward per episode against Novice 0 10 20 30 40 50 60 training time (min) Reward per episode against Experienced 0 10 20 30 40 50 60 training time (min) Reward per episode against Expert PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
  • 48. 28/52 Learning Curves in Simulation and Digital Twin 0 50 100 Novice Reward per episode 0 50 100 Episode length (steps) 0.0 0.5 1.0 P[intrusion interrupted] 0.0 0.5 1.0 1.5 P[early stopping] 5 10 Duration of intrusion −50 0 50 100 experienced 0 50 100 150 0.0 0.5 1.0 0.0 0.5 1.0 1.5 0 5 10 0 20 40 60 training time (min) −50 0 50 100 expert 0 20 40 60 training time (min) 0 50 100 150 0 20 40 60 training time (min) 0.0 0.5 1.0 0 20 40 60 training time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 training time (min) 0 5 10 15 20 πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound 0 20 40 60 80 100 # training iterations 0 1 2 3 Exploitability 0 20 40 60 80 100 # training iterations −5.0 −2.5 0.0 2.5 5.0 Defender reward per episode 0 20 40 60 80 100 # training iterations 0 1 2 3 Intrusion length (π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
  • 49. 28/52 Learning Curves in Simulation and Digital Twin 0 50 100 Novice Reward per episode 0 50 100 Episode length (steps) 0.0 0.5 1.0 P[intrusion interrupted] 0.0 0.5 1.0 1.5 P[early stopping] 5 10 Duration of intrusion −50 0 50 100 experienced 0 50 100 150 0.0 0.5 1.0 0.0 0.5 1.0 1.5 0 5 10 0 20 40 60 training time (min) −50 0 50 100 expert 0 20 40 60 training time (min) 0 50 100 150 0 20 40 60 training time (min) 0.0 0.5 1.0 0 20 40 60 training time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 training time (min) 0 5 10 15 20 πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound 0 20 40 60 80 100 # training iterations 0 1 2 3 Exploitability 0 20 40 60 80 100 # training iterations −5.0 −2.5 0.0 2.5 5.0 Defender reward per episode 0 20 40 60 80 100 # training iterations 0 1 2 3 Intrusion length (π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound Stopping is about timing; now we consider timing + action selection
  • 50. 29/52 General Intrusion Response Game I Suppose the defender and the attacker can take L actions per node I G = h{gw} ∪ V, Ei: directed tree representing the virtual infrastructure I V: set of virtual nodes I E: set of node dependencies I Z: set of zones r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 51. 29/52 General Intrusion Response Game I Suppose the defender and the attacker can take L actions per node I G = h{gw} ∪ V, Ei: directed tree representing the virtual infrastructure I V: set of virtual nodes I E: set of node dependencies I Z: set of zones r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 52. 30/52 State Space I Each i ∈ V has a state vi,t = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 53. 30/52 State Space I Each i ∈ V has a state vi,t = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 54. 30/52 State Space I Each i ∈ V has a state vi,t = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 55. 31/52 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}.
  • 56. 31/52 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}. gw fw idps lb http servers auth server search engine db cache
  • 57. 32/52 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}. I Each w ∈ W is realized as a subtree Gw = h{gw} ∪ Vw, Ewi of G I W = {w1, . . . , w|W|} induces a partitioning V = [ wi ∈W Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅ Zone a Zone b Zone c gw 1 2 3 4 5 6 7 A workflow tree
  • 58. 32/52 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}. I Each w ∈ W is realized as a subtree Gw = h{gw} ∪ Vw, Ewi of G I W = {w1, . . . , w|W|} induces a partitioning V = [ wi ∈W Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅ Zone a Zone b Zone c gw 1 2 3 4 5 6 7 A workflow tree
  • 59. 33/52 Observations I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 60. 33/52 Observations I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 62. 35/52 Defender I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App node Defender Revoke certificates Blacklist IP 1) Node migration 2) Flow migration and blocking 3) Shut down node 4) Access control
  • 63. 35/52 Defender I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App node Defender Revoke certificates Blacklist IP 1) Node migration 2) Flow migration and blocking 3) Shut down node 4) Access control
  • 64. 35/52 Defender I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App node Defender Revoke certificates Blacklist IP 1) Node migration 2) Flow migration and blocking 3) Shut down node 4) Access control
  • 65. 36/52 Attacker I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 66. 36/52 Attacker I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 67. 36/52 Attacker I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 68. 37/52 The Intrusion Response Problem maximize πD∈ΠD minimize πA∈ΠA E(πD,πA) [J] subject to s (D) t+1 ∼ fD · | A (D) t , A (D) t ∀t s (A) t+1 ∼ fA · | S (A) t , At ∀t ot+1 ∼ Z · | S (D) t+1, A (A) t ) ∀t a (A) t ∼ πA · | H (A) t , a (A) t ∈ AA(st) ∀t a (D) t ∼ πD · | H (D) t , a (D) t ∈ AD ∀t E(πD,πA) denotes the expectation of the random vectors (St, Ot, At)t∈{1,...,T} when following the strategy profile (πD, πA) (1) can be formulated as a zero-sum Partially Observed Stochastic Game with Public Observations (a PO-POSG): Γ = hN, (Si )i∈N , (Ai )i∈N , (fi )i∈N , u, γ, (b (i) 1 )i∈N , O, Zi
  • 69. 38/52 Existence of a Solution Theorem Given the po-posg Γ, the following holds: (A) Γ has a mixed Nash equilibrium and a value function V ? : BD × BA → R. (B) For each strategy pair (πA, πD) ∈ ΠA × ΠD, the best response sets BD(πA) and BA(πD) are non-empty.
  • 70. 39/52 The Curse of Dimensionality I While Γ has a value, computing it is intractable. The state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
  • 71. 39/52 The Curse of Dimensionality I While (1) has a solution (i.e the game Γ has a value (Thm 1)), computing it is intractable since the state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V| We tackle the scability challenge with decomposition
  • 72. 40/52 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... Does not directly depend on the state or action of a node down here
  • 73. 40/52 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... But they are not completely independent either. How can we exploit this structure? Does not directly depend on the state or action of a node down here
  • 74. 41/52 Our Approach: System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level with optimal substructure 3. Threshold properties of local defender strategies. I Optimal node-level strategies exhibit threshold structures
  • 75. 42/52 Additive Structure Across Workflows (Intuition) “=” I If there is no path between i and j in G, then i and j are independent in the following sense: I Compromising i has no affect on the state of j. I Compromising i does not make it harder or easier to compromise j. I Compromising i does not affect the service provided by j. I Defending i does not affect the state of j. I Defending i does not affect the service provided by j.
  • 76. 43/52 Additive Structure Across Workflows Definition (Transition independence) A set of nodes Q are transition independent iff the transition probabilities factorize as f (St+1 | St, At) = Y i∈Q f (St+1,i | St,i , At,i ) Definition (Utility independence) A set of nodes Q are utility independent iff there exists functions u1, . . . , u|Q| such that the utility function u decomposes as u(St, At) = f (u1(St,1, At,1), . . . , u1(St,|Q|, At,Q)) and ui ≤ u0 i ⇐⇒ f (u1, . . . , ui , . . . , u|Q|) ≤ f (u1, . . . , u0 i , . . . , u|Q|)
  • 77. 44/52 Additive Structure Across Workflows Theorem (Node independencies) (A) All nodes V in the game Γ are transition independent. (B) If there is no path between i and j in the topology graph G, then i and j are utility independent. Corollary (Additive structure across workflows) Γ decomposes into |W| additive subproblems that can be solved independently and in parallel. π (w1) k π (w2) k π (w|W|) k ot,w1 ot,w2 ot,w|W| . . . ⊕ a (k) w1 a (k) w2 a (k) w|W| a (k) t
  • 78. 45/52 Optimal Substructure Within a Workflow I Nodes in the same workflow are utility dependent. I =⇒ Adding locally-optimal strategies does not yield an optimal workflow strategy. I However, the locally-optimal strategies satisfy the optimal substructure property. I =⇒ there exists an algorithm for constructing an optimal workflow strategy from locally-optimal strategies for each node. Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} IT infrastructure Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 79. 45/52 Optimal Substructure Within a Workflow I Nodes in the same workflow are utility dependent. I =⇒ Adding locally-optimal strategies does not yield an optimal workflow strategy. I However, the locally-optimal strategies satisfy the optimal substructure property. I =⇒ there exists an algorithm for constructing an optimal workflow strategy from locally-optimal strategies for each node. Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} IT infrastructure Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 80. 46/52 Scalable Learning through Decomposition 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 linear measured # parallel processes n |V| = 10 Speedup S n Speedup of best response computation for the decomposed game; Tn denotes the completion time with n processes; the speedup is calculated as Sn = T1 Tn ; the error bars indicate standard deviations from 3 measurements.
  • 81. 47/52 Threshold Properties of Local Defender Strategies. Belief space B (j) D Switching curve Υ Continuation set C Stopping set S (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised I A node can be in three attack states s (A) t : Healthy, Discovered, Compromised. I The defender has a belief state b (D) t
  • 82. 48/52 Proof Sketch (Threshold Properties) I Let L(e1, b̂) denote the line segment that starts at the belief state e1 = (1, 0, 0) and ends at b̂, where b̂ is in the sub-simplex that joins e2 and e3. I All beliefs on L(e1, b̂) are totally ordered according to the Monotone Likelihood Ratio (MLR) order. =⇒ a threshold belief state αb̂ ∈ L(e1, b̂) exists where the optimal strategy switches from C to S. I Since the entire belief space can be covered by the union of lines L(e1, b̂), the threshold belief states αb̂1 , αb̂2 , . . . yield a switching curve Υ. Belief space B (j) D (the 2-dimensional unit simplex) sub-simplex B (j) D,e1 joining e2 and e3 b̂5 b̂4 b̂3 b̂2 b̂1 b̂6 b̂7 b̂8 b̂9 L(e1, b̂5) Switching curve Υ Threshold belief state αb̂9 e1 (1, 0, 0) e2 (0, 1, 0) e3 (0, 0, 1)
  • 83. 49/52 Decompositional Fictitious Play (DFSP) π̃A ∈ BA(πD) πA πD π̃D ∈ BD(πA) π̃0 A ∈ BA(π0 D) π0 A π0 D π̃0 D ∈ BD(π0 A) . . . π? A ∈ BA(π? D) π? D ∈ BD(π? A) Fictitious play: iterative averaging of best responses. I Learn best response strategies iteratively through the parallel solving of subgames in the decomposition I Average best responses to approximate the equilibrium
  • 84. 50/52 Learning Equilibrium Strategies 0 20 40 60 80 100 running time (h) 0 5 b δ = 0.4 Approximate exploitability b δ 0 20 40 60 80 100 running time (h) 0.0 0.5 1.0 Defender utility per episode dfsp simulation dfsp digital twin upper bound oi,t 0 random defense Learning curves obtained during training of dfsp to find optimal (equilibrium) strategies in the intrusion response game; red and blue curves relate to dfsp; black, orange and green curves relate to baselines.
  • 85. 51/52 Comparison with NFSP 0 10 20 30 40 50 60 70 80 running time (h) 0.0 2.5 5.0 7.5 Approximate exploitability dfsp nfsp Learning curves obtained during training of dfsp and nfsp to find optimal (equilibrium) strategies in the intrusion response game; the red curve relate to dfsp and the purple curve relate to nfsp; all curves show simulation results.
  • 86. 52/52 Conclusions I We develop a framework to automatically learn security strategies. I We apply the framework to an intrusion response use case. I We derive properties of optimal security strategies. I We evaluate strategies on a digital twin. I Questions → demonstration s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation Target System Model Creation System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation Learning