4. 3/27
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
5. 3/27
Approach: Self-Learning Security Systems
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach: Self-Learning
Systems:
I real-time telemetry
I stream processing
I theories from control/game/decision
theory
I computational methods (e.g.
dynamic programming &
reinforcement learning)
I automated network management
(SDN, NFV, etc.)
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
6. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
7. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
8. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
9. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
10. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
11. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
12. 4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
13. 4/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
14. 4/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
When to take a defensive action?
Which action to take?
15. 5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
16. 5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
17. 5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
18. 5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
19. 5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
20. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
21. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
22. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
23. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
24. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
25. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
26. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
27. 6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
28. 7/27
Creating a Digital Twin of the Target Infrastructure
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
29. 8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
30. 8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
31. 8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
32. 8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
33. 8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
34. 8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
35. 8/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
36. 9/27
System Model
I We model the evolution of the system with a discrete-time
dynamical system.
I We assume a Markovian system with stochastic dynamics and
partial observability.
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Controller
action at
observation
ot
state
st
belief
bt
37. 9/27
System Model
I We model the evolution of the system with a discrete-time
dynamical system.
I We assume a Markovian system with stochastic dynamics and
partial observability.
I A Partially Observed Markov Decision Process (POMDP)
I If attacker is static.
I A Partially Observed Stochastic Game (POSG)
I If attacker is dynamic.
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Controller
Attacker
action a
(1)
t
action a
(2)
t
observation
ot
state
st
belief
bt
38. 9/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
39. 10/27
System Identification
ˆ
f
O
(o
t
|0)
Probability distribution of # IPS alerts weighted by priority ot
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
ˆ
f
O
(o
t
|1)
Fitted model Distribution st = 0 Distribution st = 1
I The distribution fO of defender observations (system metrics)
is unknown.
I We fit a Gaussian mixture distribution ˆ
fO as an estimate of fO
in the target infrastructure.
I For each state s, we obtain the conditional distribution ˆ
fO|s
through expectation-maximization.
40. 11/27
The Simulation System
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulation System
Reinforcement Learning &
Numerical methods
I Simulations:
I Markov decision processes
I Stochastic games
I Learning/computing defender strategies:
I Reinforcement learning
I Stochastic approximation
I Computational game theory
I Dynamic programming
I Optimization
41. 12/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
42. 12/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
43. 12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
44. 12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
45. 12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
46. 12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
47. 12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
Our work 2020-2022
only
2 actions
3 states!
48. 12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
Our work 2020-2022
Planned work
only
2 actions
3 states!
49. 13/27
1: Intrusion Prevention through Optimal Stopping1
I Intrusion Prevention as an Optimal Stopping Problem:
I A stochastic process (st)T
t=1 is observed sequentially
I Two options per t: (i) continue to observe; or (ii) stop
I Find the optimal stopping time τ∗
:
τ∗
= arg max
τ
Eτ
" τ−1
X
t=1
γt−1
RC
st st+1
+ γτ−1
RS
sτ sτ
#
where RS
ss0 & RC
ss0 are the stop/continue rewards
I Stop action = Defensive action
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
interrupt the intrusion
Episode
1
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
50. 14/27
1: Intrusion Prevention through Optimal Stopping2
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Stopping strategy
at ∈ {S, C}
IPS alerts
ot
state
st
belief
bt ∈ [0, 1]
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O
I ot is drawn from r.v. O ∼ fO(·|st).
I Based on history ht of observations, the defender can compute
the belief bt(st) = P[st|ht].
I Actions: A1 = A2 = {S, C}
I Rewards: security and service.
I Transition probabilities: Follows from game dynamics.
2
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
54. 16/27
Learning Curves in Simulation and Emulation
0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0.0
0.5
1.0
P[intrusion interrupted]
0.0
0.5
1.0
1.5
P[early stopping]
5
10
Duration of intrusion
−50
0
50
100
experienced
0
50
100
150
0.0
0.5
1.0
0.0
0.5
1.0
1.5
0
5
10
0 20 40 60
training time (min)
−50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0.0
0.5
1.0
0 20 40 60
training time (min)
0.0
0.5
1.0
1.5
2.0
0 20 40 60
training time (min)
0
5
10
15
20
πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
55. 17/27
2: Intrusion Prevention through Optimal Multiple
Stopping3
I Intrusion Prevention
through Multiple Optimal
Stopping:
I Maximize reward of
stopping times
τL, τL−1, . . . , τ1:
π∗
l ∈ arg max
πl
Eπl
" τL−1
X
t=1
γt−1
RC
st ,st+1,L
+ γτL−1
RS
sτL
,sτL+1,L + . . . +
τ1−1
X
t=τ2+1
γt−1
RC
st ,st+1,1 + γτ1−1
RS
sτ1
,sτ1+1,1
#
I Each stopping time = one
defensive action
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
final stop
lt = 0
intrusion
prevented
lt = 0
3
Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on
Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781.
60. 19/27
Comparison against State-of-the-art Algorithms
0 10 20 30 40 50 60
training time (min)
0
50
100
Reward per episode against Novice
0 10 20 30 40 50 60
training time (min)
Reward per episode against Experienced
0 10 20 30 40 50 60
training time (min)
Reward per episode against Expert
PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
61. 20/27
3: Intrusion Prevention through Optimal Multiple Stopping
and Game-Play4
I Optimal stopping (Dynkin) game:
I Dynamic attacker
I Stop actions of the defender determine
when to take defensive actions
I Goal: find Nash Equilibrium (NE)
strategies and game value
J1(π1,l , π2,l ) = E(π1,l ,π2,l )
" T
X
t=1
γt−1
Rlt
(st, at)
#
B1(π2,l ) = arg max
π1,l ∈Π1
J1(π1,l , π2,l )
B2(π1,l ) = arg min
π2,l ∈Π2
J1(π1,l , π2,l )
(π∗
1,l , π∗
2,l ) ∈ B1(π∗
2,l ) × B2(π∗
1,l ) NE
π̃2,l ∈ B2(π1,l )
π2,l
π1,l
π̃1,l ∈ B1(π2,l )
π̃0
2,l ∈ B2(π0
1,l )
π0
2,l
π0
1,l
π̃0
1,l ∈ B1(π0
2,l )
. . .
π∗
2,l ∈ B2(π∗
1,l )
π∗
1,l ∈ B1(π∗
2,l )
4
Kim Hammar and Rolf Stadler. “Learning Security Strategies through Game Play and Optimal Stopping”. In:
Proceedings of the ML4Cyber workshop, ICML 2022, Baltimore, USA, July 17-23, 2022. PMLR, 2022.
68. 23/27
4: Learning in Dynamic IT Environments5
Policy Learning
Agent
Environment
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
and Attack
Scenarios
Target
System
Model
M
Traces h1, h2, . . .
Policy π
Configuration I
and change events
Policy π
Policy evaluation &
Data collection
Automated
security policy
5
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
69. 24/27
4: Learning in Dynamic IT Environments6
Algorithm 1: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
6
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
70. 25/27
Learning in Dynamic IT Environments7
200
400
600
#
clients
5000
10000
E[
Ẑ]
0 10 20 30 40 50
execution time (hours)
0
20
Avg
reward
E
h
Ẑt,O|1
i
E
h
Ẑt,O|0
i
E
h
Ẑ
[10]
t,O|1
i
E
h
Ẑ
[10]
t,O|0
i
upper bound Our framework [10]
Results from running our framework for 50 hours in the digital
twin/emulation.
7
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
71. 26/27
Current and Future Work
Time
st
st+1
st+2
st+3
. . .
rt+1
rt+2
rt+3
rrT
1. Closing the gap to reality
I Additional defender actions
I Utilize SDN controller and NFV-based defenses
I Increase observation space and attacker model
I More heterogeneous client population
2. Extend solution framework
I Model-predictive control
I Rollout-based techniques
I Extend system identification algorithm
3. Extend theoretical results
I Exploit symmetries and causal structure
I Utilize theory to improve sample efficiency
I Decompose solution framework hierarchically
72. 27/27
Conclusions
I We develop a method to
automatically learn security strategies.
I We apply the method to an intrusion
prevention use case.
I We show numerical results in a
realistic emulation environment.
I We design a solution framework guided
by the theory of optimal stopping.
I We present several theoretical results
on the structure of the optimal
solution.
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation &
Learning