SlideShare a Scribd company logo
1 of 72
Download to read offline
CDIS Fall Retreat 2022
Oct 27-28
Self-Learning Systems for Cyber Defense
Kim Hammar, Rolf Stadler
1/27
2/27
Self-Learning Security Systems: Current Landscape
Levels of security automation
No automation.
Manual detection.
Manual prevention.
No alerts.
No automatic responses.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Manual detection.
Manual prevention.
Audit logs.
Security tools.
Partial automation.
System has automated functions
for detection/prevention
but requires manual
updating and configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
Automated attack detection.
Automated attack mitigation.
3/27
Challenges: Evolving and Automated Attacks
I Challenges
I Evolving & automated attacks
I Complex infrastructures
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3/27
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3/27
Approach: Self-Learning Security Systems
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach: Self-Learning
Systems:
I real-time telemetry
I stream processing
I theories from control/game/decision
theory
I computational methods (e.g.
dynamic programming &
reinforcement learning)
I automated network management
(SDN, NFV, etc.)
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
Example Use Case: Intrusion Prevention
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
When to take a defensive action?
Which action to take?
5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
5/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
7/27
Creating a Digital Twin of the Target Infrastructure
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
8/27
Creating a Digital Twin of the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
8/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
9/27
System Model
I We model the evolution of the system with a discrete-time
dynamical system.
I We assume a Markovian system with stochastic dynamics and
partial observability.
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Controller
action at
observation
ot
state
st
belief
bt
9/27
System Model
I We model the evolution of the system with a discrete-time
dynamical system.
I We assume a Markovian system with stochastic dynamics and
partial observability.
I A Partially Observed Markov Decision Process (POMDP)
I If attacker is static.
I A Partially Observed Stochastic Game (POSG)
I If attacker is dynamic.
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Controller
Attacker
action a
(1)
t
action a
(2)
t
observation
ot
state
st
belief
bt
9/27
Our Approach for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
10/27
System Identification
ˆ
f
O
(o
t
|0)
Probability distribution of # IPS alerts weighted by priority ot
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
ˆ
f
O
(o
t
|1)
Fitted model Distribution st = 0 Distribution st = 1
I The distribution fO of defender observations (system metrics)
is unknown.
I We fit a Gaussian mixture distribution ˆ
fO as an estimate of fO
in the target infrastructure.
I For each state s, we obtain the conditional distribution ˆ
fO|s
through expectation-maximization.
11/27
The Simulation System
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulation System
Reinforcement Learning &
Numerical methods
I Simulations:
I Markov decision processes
I Stochastic games
I Learning/computing defender strategies:
I Reinforcement learning
I Stochastic approximation
I Computational game theory
I Dynamic programming
I Optimization
12/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
12/27
Outline
I Use Case & Motivation:
I Use case: Intrusion prevention
I Self-learning security systems: current landscape
I Our Approach
I Network emulation and digital twin
I Stochastic game simulation and reinforcement learning
I Summary of results so far
I Comparison with related work
I Intrusion prevention through optimal multiple stopping
I Dynkin games and learning in dynamic environments
I System for policy validation
I Outlook on future work
I Extend use case
I Rollout-based methods
I Conclusions
I Takeaways
12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
Our work 2020-2022
only
2 actions
3 states!
12/27
Related Work on Self-Learning Security Systems
External validity
Use case/
control/
learning
complexity
Goal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)
etc. 2022.
Our work 2020-2022
Planned work
only
2 actions
3 states!
13/27
1: Intrusion Prevention through Optimal Stopping1
I Intrusion Prevention as an Optimal Stopping Problem:
I A stochastic process (st)T
t=1 is observed sequentially
I Two options per t: (i) continue to observe; or (ii) stop
I Find the optimal stopping time τ∗
:
τ∗
= arg max
τ
Eτ
" τ−1
X
t=1
γt−1
RC
st st+1
+ γτ−1
RS
sτ sτ
#
where RS
ss0 & RC
ss0 are the stop/continue rewards
I Stop action = Defensive action
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
interrupt the intrusion
Episode
1
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
14/27
1: Intrusion Prevention through Optimal Stopping2
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Stopping strategy
at ∈ {S, C}
IPS alerts
ot
state
st
belief
bt ∈ [0, 1]
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O
I ot is drawn from r.v. O ∼ fO(·|st).
I Based on history ht of observations, the defender can compute
the belief bt(st) = P[st|ht].
I Actions: A1 = A2 = {S, C}
I Rewards: security and service.
I Transition probabilities: Follows from game dynamics.
2
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
14/27
Convex Stopping set with Threshold α∗
1 ∈ B
b(1)
0 1
belief space B = [0, 1]
14/27
Convex Stopping set with Threshold α∗
1 ∈ B
b(1)
0 1
belief space B = [0, 1]
S1
α∗
1
15/27
Bang-Bang Controller with Threshold α∗
1 ∈ B
0
1
2
1
ak = π∗
k(b)
b∗
1
threshold
b (belief)
continue
stop
16/27
Learning Curves in Simulation and Emulation
0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0.0
0.5
1.0
P[intrusion interrupted]
0.0
0.5
1.0
1.5
P[early stopping]
5
10
Duration of intrusion
−50
0
50
100
experienced
0
50
100
150
0.0
0.5
1.0
0.0
0.5
1.0
1.5
0
5
10
0 20 40 60
training time (min)
−50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0.0
0.5
1.0
0 20 40 60
training time (min)
0.0
0.5
1.0
1.5
2.0
0 20 40 60
training time (min)
0
5
10
15
20
πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
17/27
2: Intrusion Prevention through Optimal Multiple
Stopping3
I Intrusion Prevention
through Multiple Optimal
Stopping:
I Maximize reward of
stopping times
τL, τL−1, . . . , τ1:
π∗
l ∈ arg max
πl
Eπl
" τL−1
X
t=1
γt−1
RC
st ,st+1,L
+ γτL−1
RS
sτL
,sτL+1,L + . . . +
τ1−1
X
t=τ2+1
γt−1
RC
st ,st+1,1 + γτ1−1
RS
sτ1
,sτ1+1,1
#
I Each stopping time = one
defensive action
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
final stop
lt = 0
intrusion
prevented
lt = 0
3
Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on
Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781.
18/27
Structural Result: Optimal Multi-Threshold Policy &
Nested Stopping Sets
b(1)
0 1
belief space B = [0, 1]
18/27
Structural Result: Optimal Multi-Threshold Policy &
Nested Stopping Sets
b(1)
0 1
belief space B = [0, 1]
S1
α∗
1
18/27
Structural Result: Optimal Multi-Threshold Policy &
Nested Stopping Sets
b(1)
0 1
belief space B = [0, 1]
S1
S2
α∗
1
α∗
2
18/27
Structural Result: Optimal Multi-Threshold Policy &
Nested Stopping Sets
b(1)
0 1
belief space B = [0, 1]
S1
S2
.
.
.
SL
α∗
1
α∗
2
α∗
L
. . .
19/27
Comparison against State-of-the-art Algorithms
0 10 20 30 40 50 60
training time (min)
0
50
100
Reward per episode against Novice
0 10 20 30 40 50 60
training time (min)
Reward per episode against Experienced
0 10 20 30 40 50 60
training time (min)
Reward per episode against Expert
PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
20/27
3: Intrusion Prevention through Optimal Multiple Stopping
and Game-Play4
I Optimal stopping (Dynkin) game:
I Dynamic attacker
I Stop actions of the defender determine
when to take defensive actions
I Goal: find Nash Equilibrium (NE)
strategies and game value
J1(π1,l , π2,l ) = E(π1,l ,π2,l )
" T
X
t=1
γt−1
Rlt
(st, at)
#
B1(π2,l ) = arg max
π1,l ∈Π1
J1(π1,l , π2,l )
B2(π1,l ) = arg min
π2,l ∈Π2
J1(π1,l , π2,l )
(π∗
1,l , π∗
2,l ) ∈ B1(π∗
2,l ) × B2(π∗
1,l ) NE
π̃2,l ∈ B2(π1,l )
π2,l
π1,l
π̃1,l ∈ B1(π2,l )
π̃0
2,l ∈ B2(π0
1,l )
π0
2,l
π0
1,l
π̃0
1,l ∈ B1(π0
2,l )
. . .
π∗
2,l ∈ B2(π∗
1,l )
π∗
1,l ∈ B1(π∗
2,l )
4
Kim Hammar and Rolf Stadler. “Learning Security Strategies through Game Play and Optimal Stopping”. In:
Proceedings of the ML4Cyber workshop, ICML 2022, Baltimore, USA, July 17-23, 2022. PMLR, 2022.
21/27
Structure of Best Response Strategies
b(1)
Defender
0 1
21/27
Structure of Best Response Strategies
b(1)
Defender
0 1
S
(1)
1
α̃1
21/27
Structure of Best Response Strategies
b(1)
Defender
0 1
S
(1)
1
S
(1)
2
α̃1
α̃2
21/27
Structure of Best Response Strategies
b(1)
Defender
0 1
S
(1)
1
S
(1)
2
.
.
.
S
(1)
L
α̃1
α̃2
α̃L . . .
21/27
Structure of Best Response Strategies
b(1)
Defender
0 1
S
(1)
1,π2,l
S
(1)
2,π2,l
.
.
.
S
(1)
L,π2,l
α̃1
α̃2
α̃L . . .
b(1)
Attacker
0 1
S
(2)
1,1,π1,l
S
(2)
1,L,π1,l
β̃1,1
β̃1,L
β̃0,1
. . . β̃0,L
. . .
S
(2)
0,1,π1,l
S
(2)
0,L,π1,l
22/27
Converge Rates and Comparison with State-of-the-art
Algorithms
0 10 20 30 40 50 60
running time (min)
0
2
4
Exploitability
0 10 20 30 40 50 60
running time (min)
0
250
500
750
Approximation error (gap)
55.0 57.5 60.0
7.5
10.0
T-FP NFSP HSVI
23/27
4: Learning in Dynamic IT Environments5
Policy Learning
Agent
Environment
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
and Attack
Scenarios
Target
System
Model
M
Traces h1, h2, . . .
Policy π
Configuration I
and change events
Policy π
Policy evaluation &
Data collection
Automated
security policy
5
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
24/27
4: Learning in Dynamic IT Environments6
Algorithm 1: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
6
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
25/27
Learning in Dynamic IT Environments7
200
400
600
#
clients
5000
10000
E[
Ẑ]
0 10 20 30 40 50
execution time (hours)
0
20
Avg
reward
E
h
Ẑt,O|1
i
E
h
Ẑt,O|0
i
E
h
Ẑ
[10]
t,O|1
i
E
h
Ẑ
[10]
t,O|0
i
upper bound Our framework [10]
Results from running our framework for 50 hours in the digital
twin/emulation.
7
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
26/27
Current and Future Work
Time
st
st+1
st+2
st+3
. . .
rt+1
rt+2
rt+3
rrT
1. Closing the gap to reality
I Additional defender actions
I Utilize SDN controller and NFV-based defenses
I Increase observation space and attacker model
I More heterogeneous client population
2. Extend solution framework
I Model-predictive control
I Rollout-based techniques
I Extend system identification algorithm
3. Extend theoretical results
I Exploit symmetries and causal structure
I Utilize theory to improve sample efficiency
I Decompose solution framework hierarchically
27/27
Conclusions
I We develop a method to
automatically learn security strategies.
I We apply the method to an intrusion
prevention use case.
I We show numerical results in a
realistic emulation environment.
I We design a solution framework guided
by the theory of optimal stopping.
I We present several theoretical results
on the structure of the optimal
solution.
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation &
Learning

More Related Content

Similar to Self-Learning Systems for Cyber Defense

Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecurityKim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)
How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)
How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)Vladimir Kochetkov
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX SlidesEricGoldstrom
 
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06Fabrice Trollet
 
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 Ethical Hacking Conference 2015- Building Secure Products -a perspective Ethical Hacking Conference 2015- Building Secure Products -a perspective
Ethical Hacking Conference 2015- Building Secure Products -a perspectiveDr. Anish Cheriyan (PhD)
 
Top 10 Security Challenges
Top 10 Security ChallengesTop 10 Security Challenges
Top 10 Security ChallengesJorge Sebastiao
 
Common Sense Guide to Mitigating Insider Threats
Common Sense Guide to Mitigating Insider ThreatsCommon Sense Guide to Mitigating Insider Threats
Common Sense Guide to Mitigating Insider ThreatsSammie Gillaspie
 
Research of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortResearch of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortFrancis Yang
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Precisely
 
NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...
NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...
NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...North Texas Chapter of the ISSA
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Multi-vocal Review of security orchestration
Multi-vocal Review of security orchestrationMulti-vocal Review of security orchestration
Multi-vocal Review of security orchestrationChadni Islam
 
RAMNSS_2016_service_porfolio
RAMNSS_2016_service_porfolioRAMNSS_2016_service_porfolio
RAMNSS_2016_service_porfolioRhys A. Mossom
 
WhiteHat’s Website Security Statistics Report 2015
WhiteHat’s Website Security Statistics Report 2015WhiteHat’s Website Security Statistics Report 2015
WhiteHat’s Website Security Statistics Report 2015Jeremiah Grossman
 

Similar to Self-Learning Systems for Cyber Defense (20)

Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)
How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)
How to Develop a Secure Web Application and Stay in Mind? (PHDays 3)
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX Slides
 
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
 
ISOC Efforts in Collaborative Responsibility Toward Internet Security and Res...
ISOC Efforts in Collaborative Responsibility Toward Internet Security and Res...ISOC Efforts in Collaborative Responsibility Toward Internet Security and Res...
ISOC Efforts in Collaborative Responsibility Toward Internet Security and Res...
 
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 Ethical Hacking Conference 2015- Building Secure Products -a perspective Ethical Hacking Conference 2015- Building Secure Products -a perspective
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 
Top 10 Security Challenges
Top 10 Security ChallengesTop 10 Security Challenges
Top 10 Security Challenges
 
Common Sense Guide to Mitigating Insider Threats
Common Sense Guide to Mitigating Insider ThreatsCommon Sense Guide to Mitigating Insider Threats
Common Sense Guide to Mitigating Insider Threats
 
Research of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortResearch of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on Snort
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
 
NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...
NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...
NTXISSACSC2 - Next-Generation Security and the Problem of Exploitation by Mat...
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Multi-vocal Review of security orchestration
Multi-vocal Review of security orchestrationMulti-vocal Review of security orchestration
Multi-vocal Review of security orchestration
 
RAMNSS_2016_service_porfolio
RAMNSS_2016_service_porfolioRAMNSS_2016_service_porfolio
RAMNSS_2016_service_porfolio
 
WhiteHat’s Website Security Statistics Report 2015
WhiteHat’s Website Security Statistics Report 2015WhiteHat’s Website Security Statistics Report 2015
WhiteHat’s Website Security Statistics Report 2015
 

More from Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...Kim Hammar
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedKim Hammar
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhetKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupKim Hammar
 

More from Kim Hammar (16)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Self-Learning Systems for Cyber Defense

  • 1. CDIS Fall Retreat 2022 Oct 27-28 Self-Learning Systems for Cyber Defense Kim Hammar, Rolf Stadler 1/27
  • 2. 2/27 Self-Learning Security Systems: Current Landscape Levels of security automation No automation. Manual detection. Manual prevention. No alerts. No automatic responses. Lack of tools. 1980s 1990s 2000s-Now Research Operator assistance. Manual detection. Manual prevention. Audit logs. Security tools. Partial automation. System has automated functions for detection/prevention but requires manual updating and configuration. Intrusion detection systems. Intrusion prevention systems. High automation. System automatically updates itself. Automated attack detection. Automated attack mitigation.
  • 3. 3/27 Challenges: Evolving and Automated Attacks I Challenges I Evolving & automated attacks I Complex infrastructures Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 4. 3/27 Goal: Automation and Learning I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 5. 3/27 Approach: Self-Learning Security Systems I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods I Our Approach: Self-Learning Systems: I real-time telemetry I stream processing I theories from control/game/decision theory I computational methods (e.g. dynamic programming & reinforcement learning) I automated network management (SDN, NFV, etc.) Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 6. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 7. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 8. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 9. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 10. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 11. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 12. 4/27 Example Use Case: Intrusion Prevention 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 13. 4/27 The Intrusion Prevention Problem 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins
  • 14. 4/27 The Intrusion Prevention Problem 20 40 60 80 100 120 140 160 180 200 0.5 1 t b 20 40 60 80 100 120 140 160 180 200 50 100 150 200 t Alerts 20 40 60 80 100 120 140 160 180 200 5 10 15 t Logins When to take a defensive action? Which action to take?
  • 15. 5/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 16. 5/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 17. 5/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 18. 5/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 19. 5/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 20. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 21. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 22. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 23. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 24. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 25. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 26. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 27. 6/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 28. 7/27 Creating a Digital Twin of the Target Infrastructure s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 29. 8/27 Creating a Digital Twin of the Target Infrastructure I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 30. 8/27 Creating a Digital Twin of the Target Infrastructure I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 31. 8/27 Creating a Digital Twin of the Target Infrastructure I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 32. 8/27 Creating a Digital Twin of the Target Infrastructure I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 33. 8/27 Creating a Digital Twin of the Target Infrastructure I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 34. 8/27 Creating a Digital Twin of the Target Infrastructure I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 35. 8/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 36. 9/27 System Model I We model the evolution of the system with a discrete-time dynamical system. I We assume a Markovian system with stochastic dynamics and partial observability. Stochastic System (Markov) Noisy Sensor Optimal filter Controller action at observation ot state st belief bt
  • 37. 9/27 System Model I We model the evolution of the system with a discrete-time dynamical system. I We assume a Markovian system with stochastic dynamics and partial observability. I A Partially Observed Markov Decision Process (POMDP) I If attacker is static. I A Partially Observed Stochastic Game (POSG) I If attacker is dynamic. Stochastic System (Markov) Noisy Sensor Optimal filter Controller Attacker action a (1) t action a (2) t observation ot state st belief bt
  • 38. 9/27 Our Approach for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 39. 10/27 System Identification ˆ f O (o t |0) Probability distribution of # IPS alerts weighted by priority ot 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ˆ f O (o t |1) Fitted model Distribution st = 0 Distribution st = 1 I The distribution fO of defender observations (system metrics) is unknown. I We fit a Gaussian mixture distribution ˆ fO as an estimate of fO in the target infrastructure. I For each state s, we obtain the conditional distribution ˆ fO|s through expectation-maximization.
  • 40. 11/27 The Simulation System s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Simulation System Reinforcement Learning & Numerical methods I Simulations: I Markov decision processes I Stochastic games I Learning/computing defender strategies: I Reinforcement learning I Stochastic approximation I Computational game theory I Dynamic programming I Optimization
  • 41. 12/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 42. 12/27 Outline I Use Case & Motivation: I Use case: Intrusion prevention I Self-learning security systems: current landscape I Our Approach I Network emulation and digital twin I Stochastic game simulation and reinforcement learning I Summary of results so far I Comparison with related work I Intrusion prevention through optimal multiple stopping I Dynkin games and learning in dynamic environments I System for policy validation I Outlook on future work I Extend use case I Rollout-based methods I Conclusions I Takeaways
  • 43. 12/27 Related Work on Self-Learning Security Systems External validity Use case/ control/ learning complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection)
  • 44. 12/27 Related Work on Self-Learning Security Systems External validity Use case/ control/ learning complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets)
  • 45. 12/27 Related Work on Self-Learning Security Systems External validity Use case/ control/ learning complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets) Xiao et al. 2021. (RL approach to APT) etc. 2022.
  • 46. 12/27 Related Work on Self-Learning Security Systems External validity Use case/ control/ learning complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets) Xiao et al. 2021. (RL approach to APT) etc. 2022.
  • 47. 12/27 Related Work on Self-Learning Security Systems External validity Use case/ control/ learning complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets) Xiao et al. 2021. (RL approach to APT) etc. 2022. Our work 2020-2022 only 2 actions 3 states!
  • 48. 12/27 Related Work on Self-Learning Security Systems External validity Use case/ control/ learning complexity Goal Georgia et al. 2000. (Next generation intrusion detection: reinforcement learning) Xu et al. 2005. (An RL approach to host-based intrusion detection) Servin et al. 2008. (Multi-agent RL for intrusion detection) Malialis et al. 2013. (Decentralized RL response to DDoS attacks) Zhu et al. 2019. (Adaptive Honeypot engagement) Apruzzese et al. 2020. (Deep RL to evade botnets) Xiao et al. 2021. (RL approach to APT) etc. 2022. Our work 2020-2022 Planned work only 2 actions 3 states!
  • 49. 13/27 1: Intrusion Prevention through Optimal Stopping1 I Intrusion Prevention as an Optimal Stopping Problem: I A stochastic process (st)T t=1 is observed sequentially I Two options per t: (i) continue to observe; or (ii) stop I Find the optimal stopping time τ∗ : τ∗ = arg max τ Eτ " τ−1 X t=1 γt−1 RC st st+1 + γτ−1 RS sτ sτ # where RS ss0 & RC ss0 are the stop/continue rewards I Stop action = Defensive action Intrusion event time-step t = 1 Intrusion ongoing t t = T Early stopping times Stopping times that interrupt the intrusion Episode 1 Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In: International Conference on Network and Service Management (CNSM 2021). http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
  • 50. 14/27 1: Intrusion Prevention through Optimal Stopping2 Stochastic System (Markov) Noisy Sensor Optimal filter Stopping strategy at ∈ {S, C} IPS alerts ot state st belief bt ∈ [0, 1] I States: Intrusion st ∈ {0, 1}, terminal ∅. I Observations: I Number of IPS Alerts ot ∈ O I ot is drawn from r.v. O ∼ fO(·|st). I Based on history ht of observations, the defender can compute the belief bt(st) = P[st|ht]. I Actions: A1 = A2 = {S, C} I Rewards: security and service. I Transition probabilities: Follows from game dynamics. 2 Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In: International Conference on Network and Service Management (CNSM 2021). http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
  • 51. 14/27 Convex Stopping set with Threshold α∗ 1 ∈ B b(1) 0 1 belief space B = [0, 1]
  • 52. 14/27 Convex Stopping set with Threshold α∗ 1 ∈ B b(1) 0 1 belief space B = [0, 1] S1 α∗ 1
  • 53. 15/27 Bang-Bang Controller with Threshold α∗ 1 ∈ B 0 1 2 1 ak = π∗ k(b) b∗ 1 threshold b (belief) continue stop
  • 54. 16/27 Learning Curves in Simulation and Emulation 0 50 100 Novice Reward per episode 0 50 100 Episode length (steps) 0.0 0.5 1.0 P[intrusion interrupted] 0.0 0.5 1.0 1.5 P[early stopping] 5 10 Duration of intrusion −50 0 50 100 experienced 0 50 100 150 0.0 0.5 1.0 0.0 0.5 1.0 1.5 0 5 10 0 20 40 60 training time (min) −50 0 50 100 expert 0 20 40 60 training time (min) 0 50 100 150 0 20 40 60 training time (min) 0.0 0.5 1.0 0 20 40 60 training time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 training time (min) 0 5 10 15 20 πθ,l simulation πθ,l emulation (∆x + ∆y) ≥ 1 baseline Snort IPS upper bound
  • 55. 17/27 2: Intrusion Prevention through Optimal Multiple Stopping3 I Intrusion Prevention through Multiple Optimal Stopping: I Maximize reward of stopping times τL, τL−1, . . . , τ1: π∗ l ∈ arg max πl Eπl " τL−1 X t=1 γt−1 RC st ,st+1,L + γτL−1 RS sτL ,sτL+1,L + . . . + τ1−1 X t=τ2+1 γt−1 RC st ,st+1,1 + γτ1−1 RS sτ1 ,sτ1+1,1 # I Each stopping time = one defensive action 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 final stop lt = 0 intrusion prevented lt = 0 3 Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781.
  • 56. 18/27 Structural Result: Optimal Multi-Threshold Policy & Nested Stopping Sets b(1) 0 1 belief space B = [0, 1]
  • 57. 18/27 Structural Result: Optimal Multi-Threshold Policy & Nested Stopping Sets b(1) 0 1 belief space B = [0, 1] S1 α∗ 1
  • 58. 18/27 Structural Result: Optimal Multi-Threshold Policy & Nested Stopping Sets b(1) 0 1 belief space B = [0, 1] S1 S2 α∗ 1 α∗ 2
  • 59. 18/27 Structural Result: Optimal Multi-Threshold Policy & Nested Stopping Sets b(1) 0 1 belief space B = [0, 1] S1 S2 . . . SL α∗ 1 α∗ 2 α∗ L . . .
  • 60. 19/27 Comparison against State-of-the-art Algorithms 0 10 20 30 40 50 60 training time (min) 0 50 100 Reward per episode against Novice 0 10 20 30 40 50 60 training time (min) Reward per episode against Experienced 0 10 20 30 40 50 60 training time (min) Reward per episode against Expert PPO ThresholdSPSA Shiryaev’s Algorithm (α = 0.75) HSVI upper bound
  • 61. 20/27 3: Intrusion Prevention through Optimal Multiple Stopping and Game-Play4 I Optimal stopping (Dynkin) game: I Dynamic attacker I Stop actions of the defender determine when to take defensive actions I Goal: find Nash Equilibrium (NE) strategies and game value J1(π1,l , π2,l ) = E(π1,l ,π2,l ) " T X t=1 γt−1 Rlt (st, at) # B1(π2,l ) = arg max π1,l ∈Π1 J1(π1,l , π2,l ) B2(π1,l ) = arg min π2,l ∈Π2 J1(π1,l , π2,l ) (π∗ 1,l , π∗ 2,l ) ∈ B1(π∗ 2,l ) × B2(π∗ 1,l ) NE π̃2,l ∈ B2(π1,l ) π2,l π1,l π̃1,l ∈ B1(π2,l ) π̃0 2,l ∈ B2(π0 1,l ) π0 2,l π0 1,l π̃0 1,l ∈ B1(π0 2,l ) . . . π∗ 2,l ∈ B2(π∗ 1,l ) π∗ 1,l ∈ B1(π∗ 2,l ) 4 Kim Hammar and Rolf Stadler. “Learning Security Strategies through Game Play and Optimal Stopping”. In: Proceedings of the ML4Cyber workshop, ICML 2022, Baltimore, USA, July 17-23, 2022. PMLR, 2022.
  • 62. 21/27 Structure of Best Response Strategies b(1) Defender 0 1
  • 63. 21/27 Structure of Best Response Strategies b(1) Defender 0 1 S (1) 1 α̃1
  • 64. 21/27 Structure of Best Response Strategies b(1) Defender 0 1 S (1) 1 S (1) 2 α̃1 α̃2
  • 65. 21/27 Structure of Best Response Strategies b(1) Defender 0 1 S (1) 1 S (1) 2 . . . S (1) L α̃1 α̃2 α̃L . . .
  • 66. 21/27 Structure of Best Response Strategies b(1) Defender 0 1 S (1) 1,π2,l S (1) 2,π2,l . . . S (1) L,π2,l α̃1 α̃2 α̃L . . . b(1) Attacker 0 1 S (2) 1,1,π1,l S (2) 1,L,π1,l β̃1,1 β̃1,L β̃0,1 . . . β̃0,L . . . S (2) 0,1,π1,l S (2) 0,L,π1,l
  • 67. 22/27 Converge Rates and Comparison with State-of-the-art Algorithms 0 10 20 30 40 50 60 running time (min) 0 2 4 Exploitability 0 10 20 30 40 50 60 running time (min) 0 250 500 750 Approximation error (gap) 55.0 57.5 60.0 7.5 10.0 T-FP NFSP HSVI
  • 68. 23/27 4: Learning in Dynamic IT Environments5 Policy Learning Agent Environment System Identification s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin and Attack Scenarios Target System Model M Traces h1, h2, . . . Policy π Configuration I and change events Policy π Policy evaluation & Data collection Automated security policy 5 Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki, Greece, 2022.
  • 69. 24/27 4: Learning in Dynamic IT Environments6 Algorithm 1: High-level execution of the framework Input: emulator: method to create digital twin ϕ: system identification algorithm φ: policy learning algorithm 1 Algorithm (emulator, ϕ, φ) 2 do in parallel 3 DigitalTwin(emulator) 4 SystemIdProcess(ϕ) 5 LearningProcess(φ) 6 end 1 Procedure DigitalTwin(emulator) 2 Loop 3 π ← ReceiveFromLearningProcess() 4 ht ← CollectTrace(π) 5 SendToSystemIdProcess(ht) 6 UpdateDigitalTwin(emulator) 7 EndLoop 1 Procedure SystemIdProcess(ϕ) 2 Loop 3 h1, h2, . . . ← ReceiveFromDigitalTwin() 4 M ← ϕ(h1, h2, . . .) // estimate model 5 SendToLearningProcess(M) 6 EndLoop 1 Procedure LearningProcess(φ) 2 Loop 3 M ← ReceiveFromSystemIdProcess() 4 π ← φ(M) // learn policy π 5 SendToDigitalTwin(π) 6 EndLoop 6 Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki, Greece, 2022.
  • 70. 25/27 Learning in Dynamic IT Environments7 200 400 600 # clients 5000 10000 E[ Ẑ] 0 10 20 30 40 50 execution time (hours) 0 20 Avg reward E h Ẑt,O|1 i E h Ẑt,O|0 i E h Ẑ [10] t,O|1 i E h Ẑ [10] t,O|0 i upper bound Our framework [10] Results from running our framework for 50 hours in the digital twin/emulation. 7 Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki, Greece, 2022.
  • 71. 26/27 Current and Future Work Time st st+1 st+2 st+3 . . . rt+1 rt+2 rt+3 rrT 1. Closing the gap to reality I Additional defender actions I Utilize SDN controller and NFV-based defenses I Increase observation space and attacker model I More heterogeneous client population 2. Extend solution framework I Model-predictive control I Rollout-based techniques I Extend system identification algorithm 3. Extend theoretical results I Exploit symmetries and causal structure I Utilize theory to improve sample efficiency I Decompose solution framework hierarchically
  • 72. 27/27 Conclusions I We develop a method to automatically learn security strategies. I We apply the method to an intrusion prevention use case. I We show numerical results in a realistic emulation environment. I We design a solution framework guided by the theory of optimal stopping. I We present several theoretical results on the structure of the optimal solution. s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation Target System Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation & Learning